














Psychometrik 

















CONTENTS 
ON METHODS IN THE ANALYSIS OF PROFILE DATA... = 95 
SAMUEL W. GREENHOUSE AND SEYMOUR GEISSER 
DOMAIN SAMPLING FORMULATION OF CLUSTER AND 
Serene, PO gk kk ke ee el ee 113 
Rosert C. Tryon 
LEAST SQUARES ESTIMATION IN FINITE MARKOV PRO- 
nero Sg ee SR ett Sy eerie, by 137 
ALBERT MADANSKY 
AN AUGMENTED MODEL FOR SPONTANEOUS REGRESSION 
I ge Se ge kw oe ee we ae ee 145 
Davip McConneELL 
A MODEL FOR ORDERED METRIC SCALING BY COMPARI- 
uses eee SUPER Ge tw ee ee 157 
Rosert F. Fagor : 
A NOTE ON FACTOR ANALYSIS: ARBITRARY ORTHOGONAL 
eee kk et te te we 169 
Epwarp E. Curgton 
RANDOMLY PARALLEL TESTS AND LYERLY’S ASSUMP- 
TION FOR’ THE KUDER-RICHARDSON FORMULA 
RE rear eg Sg bg ay gt a ee wee 175 
Freperic M. Lorp 
ESTIMATING ITEM INDICES BY NOMOGRAPHS ..... 179 
Rosert M. Cotver 
BOOK REVIEW 
OxEéRoN, Pierre. Les Composantes de I’Intelligence d’aprés les 
Recherches Factorielles. . . . 1... 1 1 ee wee wee 187 
Review by R. DARRELL Bock 
VOLUME TWENTY-FOUR JUNE 1959 NUMBER 2 














PSYCHOMETRIKA—VOL. 24, NO. 2 
JUNE, 1959 


ON METHODS IN THE ANALYSIS OF PROFILE DATA 


SamuEL W. GREENHOUSE AND SEYMOUR GEISSER* 


NATIONAL INSTITUTE OF MrentTAL HEALTH 


This paper is concerned with methods for analyzing quantitative, non- 
categorical profile data, e.g., a battery of tests given to individuals in one or 
more groups. It is assumed that the variables have a multinormal distribution 
with an arbitrary variance-covariance matrix. Approximate procedures based 
on classical analysis of variance are presented, including an adjustment to 
the degrees of freedom resulting in conservative F tests. These can be applied 
to the case where the variance-covariance matrices differ from group to 
group. In addition, exact generalized multivariate analysis methods are 


discussed. Examples are given illustrating both techniques. 


Much research in the social sciences is of the multivariate type; multiple 
observations are made on individuals who have been sampled from one or 
more populations. In particular, when the observations are in the form of a 
battery of tests or a set of items, there is the problem of profile analysis, 
wherein it is customary to test for differences in the levels and in the shapes 
of the group profiles. If the variables being observed are assigned to columns 
and the individuals to rows, the resulting matrix of observations is very 
suggestive of the data usually analyzed by analysis of variance. Furthermore, 
since the rows are random and the columns can be considered in almost all 
instances as fixed, the appropriate model is the mixed model. 

As is well known, in order that the usually computed ratios of mean 
squares in this model [7, 14, 16] be exactly distributed as the F distribution, 
it is necessary that columns (variables), in addition to being normally dis- 
tributed, have equal variances and be mutually independent or, at most, 
have equal correlations. But these assumptions seem much too restrictive. 
In most investigations, it is unrealistic to assume that three or more tests, 
items, or treatment schedules have the same pairwise correlations or that 
they have the same variances. It seemed obvious, therefore, that this problem 
of multiple observations should be considered in its greatest generality, 
namely, that an individual vector x, , x2 , -+* , 2, is sampled from a p-variate 
normal distribution with an arbitrary variance-covariance matrix. 

Exact procedures for analyzing data of this type have been known for 
some time and are usually referred to as the generalized multivariate analysis 
of variance [1, 10, 12, 13, 17]. These, however, require considerably more 
computations than that demanded by the arithmetic of the analysis of 

*We are indebted to Mrs. Norma French for performing all the calculations appearing 
in this paper. 

95 











96 PSYCHOMETRIKA 


variance. Furthermore, an analysis of variance approach permits the analysis 
of a set of data which cannot be handled by multivariate procedures, namely, 
the case where n, the number of random vectors, is less than p, the number 
of variables. Although these multivariate methods are discussed subsequently 
and an example is given for the case of two groups, our main purpose is to 
utilize the simpler, and more familiar, conventional univariate analysis of 
variance techniques under the more general assumptions. Our results concern- 
ing the approximate distributions of the F statistics are based upon the work 
of Box [5, 6] with regard to one group and its extension, by Geisser and 
Greenhouse [8], to several groups. In addition, the latter have found certain 
adjustments to the approximate tests leading to conservative tests which 
can be used, when the group sample sizes are the same, in the case of unequal 
variance-covariance matrices among the groups. 

It is of interest that Block, Levine, and McNemar [2] were also primarily 
concerned with the application of the analysis of variance to the profile 
problem. They presented F tests for testing the homogeneity of variable 
(columns) means, the homogeneity of over-all group means (profile levels) 
and the equality of profile shapes. However, they assumed equal variance, 
among the variables and, since they imply that the F tests are exact, it 
can only be inferred that they also assumed the variables to be independent 
or equally correlated. 


The Problem 
Our notation is almost identical to that used by Block, Levine, and 
McNemar. Let p tests, 2; , %2 , -** , 2» , be given to each of n, individuals 
(k = 1, 2, --- , g) in each of g established groups. Assume both the p tests 


and the g groups to be fixed, i.e., they are not random elements sampled 
from larger populations. This model, which fixes interest in the tests and 
groups under study, conforms to many experimental situations met with in 
practice. The totality of Np observed scores (V = }_%_, n,) can be classified 
according to the scheme at the top of the next page. 

An individual 7 in group k has the profile 


7 = 1,--- ,m, individuals in group k 
(Dire 5 °** » Vek» °°* » Lepe) 99 = 1,°°° ,p variables 
k=1,-+++,g groups. 


And the group profile for group k, say, is represented by 


(1% »Z%.an,°°* 2 a). 














SAMUEL W. GREENHOUSE AND SEYMOUR GEISSER 


Group Ind. 
1 1 
nN 
Means: Gr. 1 
k 1 
t 
Nk 
Means: Gr. k 
g 1 
Ng 
Means: Gr. g 
Means: All Groups 
































Tests 
v1 Vj Lp 
Tin Tin Tip 
Ln, 11 Tn, il Tn,pl 
Fou o51 Z.p1 
ik Lijk Zipk 
Vik Lijk Lipk 
Tnylk Unik Tnypk 
Tak Z.jk Z.pk 
Zi19 Lijg Tipg 
Tnjlo Tn, io Tn, po 
Z.1g L.59 2 .p9 
Z.1. 2.; Zp 


97 


a 
PS 


25 


Assume that each individual profile is a random vector sampled from a 
p-variate normal distribution with an arbitrary variance-covariance matrix, 


2 
[ @ 


P12910%2 





LP1991Tp 


P129192 


2 
2 


P2pF%2 oy 


Pip%1%D 


P2pF2Fp 


eA 





| O11 


Fi2 





Tip 


Fi2 


G22 


Top 


Tip 


F2p 





Fp 


Also assume that the p variables have the same metric. This is necessary 
to give meaning to the question of whether the group profiles have the same 











98 PSYCHOMETRIKA 


shape, and not because of any statistical considerations. This restriction 
results in no loss of generality if there already exists a large body of data 
on these p tests so that the standard deviations can be assumed known. 
For in this instance, equal metrics can be obtained by standardization of the 
p test scores. 

The questions that are most often asked in profile analysis are: 

(i) Are the groups on the same level, i.e., do the groups arise from popu- 
lations having the same group means, namely, E(#..,) = E(#..2) = ++: = 
E(é..,) , where E denotes the expectation? 

(ii) Do the groups have the same shape, i.e., do the groups arise from 
populations having parallel group profiles? 

Another question that may be asked of these data, although not too 
frequently in profile analysis, is whether the p tests have the same means. 

With regard to the question on shape, it becomes necessary to define a 
statistic which reflects the concept of equally shaped group profiles. In 
a larger sense, profiles having the same shape can be considered to be parallel 
curves. As Box [4] and Block, Levine, and McNemar [2] point out, parallelism 
can be measured by the group-test interaction mean square. That is, if the 
curves are parallel, the group-test interaction should be zero and the mean 
square should not differ significantly from an appropriate error mean square. 
If, on the other hand, the curves have different shapes, the interaction mean 
square should be significantly greater than the error mean square. 

This is made clear by reference to two group profiles: 


Ti » £21 by oy, » Uni 
and 


L.12 » L22 ne ghak 4 » &n2 . 


Denote the corresponding differences between group means for each test by 


d, ,» de pe ,@,s 
If the two profiles are parallel it is clear that d, = d. = d; = --- =d,. 
On the other hand, if d,; = d, = d; = --- = d,, then the two profiles must 


be parallel. Hence, a necessary and sufficient condition that the two group 
profiles possess the same shape is that 

=a = a= Dhates =d,. 
But the equality of these differences is exactly what is meant by no interaction 
between groups and tests, and the extent to which these differences are 
unequal corresponds to the existence of the group-test interaction. Therefore 


a test of the group-test interaction is also a test of whether group profiles 
have the same shape. 


Tests of Significance In the Mixed Model 


If the p test scores have equal variances and are independent (or, at 



































SAMUEL W. GREENHOUSE AND SEYMOUR GEISSER 99 


most, are equally correlated in pairs), so that 














lo? O +++ OF 
> =/|0 a Senha) 
oS ae A 
or 
i pow p | 
X=alp 1 ++ pl, 
eas al 


then the given scheme constitutes the classical mixed model for g samples, 
with proportionate numbers of observations among the samples. The appro- 
priate analysis of variance breakdown is shown in Table 1. The analysis 
under either of the above assumptions on the covariance matrix follows 
along classical lines. The F, , F. , and F; statistics used to test hypotheses 
of homogeneity of test (variable) means, of group means (level) and the 
nonexistence of a group-test interaction (equal shapes of group profiles), 
respectively, are exact. 

If, on the other hand, the validity of these two models is suspect, on 
the basis either of prior evidence or of a statistical test, the given F ratios 
are not distributed like the tabulated F distribution. In this situation where 
the covariance matrix is assumed to be arbitrary and given by 2, Roy [13], 
Rao [12], and others have approached the problem through the multivariate 
analysis of variance. However, it is of interest, and possibly of considerable 
practical importance, to investigate the distribution of the computed F 
statistics. 


Tests of Significance for Arbitrary Covariance Matrix 


Geisser and Greenhouse [8], in extending to several groups Box’s work 
[5, 6] relative to one group, have shown that Q, and Q, are each independent 
of Q; , and Q, is independent of Q; . They have also shown that, under the 
null hypothesis, 


E(Q,) = A, say, E(Q,) = g — 1A, E(Qs) = (N - gA, 


and 


E(Q:) = (g — 1)B, say, E(Qs) = (N — g)B. 
Table 2 gives the mean square (M.S.) and the expectations of the mean 











PSYCHOMETRIKA 


100 








T-dNn TeIqoL 
he : re C1 T=fT=t1=4 ¢ (sdnop uTYyI TA) 
P ie x - VIY LO x - 4 *x) = “O (3-N)( 1-4) sqsejJ, X *ATpUT 
ar 
Sy (1-8) : T=f1=4 
= ¢ eee 4° ofe y e 4 vs h - bi d 
Ny (3-N) 4 a id os" =e x) u = ‘0 (1-3)( 1-4) sjsey x dnorp 
se es yy (sdnoip uty Tm) 
Ps re 2) ¢ d = *y 8-N S TENPTATPUT 
tu 
fy (1-2 = ao anid I 
a) (3-n) ~ a of id ‘ x) d = % 1-3 sdnoi9 
g me T eee ef T d 
Ty (3-N) = a m( Xx x) (N= 0 t- S3SeL 
d 
a soaenbs jo wns *3*D 201no0s 








eouReTIeA JO stsATeuy 


T aTavl 











SAMUEL W. GREENHOUSE AND SEYMOUR GEISSER 101 


square (E.M.S.) for each of the five sources of variation in the analysis 


of variance. 
From the results presented in Tables 1 and 2, it follows that each of the 
three F ratios, F, , F, , and F; , is a ratio of two independent mean squares 


TABLE 2 
Analysis of Variance 











Source M.S. E.M.S. 
Tests (p-1)"*9, (pe1)A 
Groups (g-1)719, B 
Individuals within Groups (n-g)""Q, B 
Groups x Tests (p-1)"*(g-1)7*9, (pny 's 
Individuals x Tests (p-1)"*(N-g)”"Q, (p-2) A 


within Groups 





with the same expectations under the null hypothesis. Making use of the 
fact that each of the quadratic forms involved in the three F statistics is 
exactly distributed like a linear sum of independent x’ variables with the 
same degrees of freedom (theorem 6.1, Box [5, 6]), F, is approximately dis- 
tributed like F[(p — 1)e, (p — 1)(N — g)el, Fs is approximately distributed 
like F[(p — 1)\g — le, (p — 1)(N — g)el, and F, is exactly distributed like 
F(g — 1, N — g), where 


e= p(éu — &..)°/(p — 1)(220%, — 2rsi. + p's”); 


o,, are the elements of the matrix 2, ¢,, is the mean of the diagonal terms, 
¢,, is the mean of the tth row (or ¢th column), and ¢., is the grand mean. 
Thus, the effect of the arbitrary variance-covariance matrix, which must be 
the same from group to group, is to assess the significance of the F, and F; 
statistics in the ordinary tabulated F distribution but with reduced degrees 
of freedom. The F, test on group means, it will be noted, remains unchanged 
from the standard F test since it results from a one-way analysis of variance 
with all observations having the same variance. 

The reduction in the degrees of freedom for this approximate test is a 
function of the elements of the population variance-covariance matrix. 
This is almost never known, and therefore e will have to be estimated from 
the sample variances and covariances. However, the effect of using an esti- 
mated ¢ on the approximate F distributions involved is unknown. Hence, 
unless the variance-covariance matrix is estimated with a large number of 
degrees of freedom, use of the conservative test given below is suggested. 








102 PSYCHOMETRIKA 


A Conservative Test 


The preceding approximate procedure requires some computations on 
the elements of a known variance-covariance matrix. In many profile problems, 
the number of tests may be as high as 50 if not more. This results in a 50 X 50 
matrix, necessitating some laborious arithmetic. Furthermore, in almost all 
problems variances and covariances are unknown and the extent to which 
e is changed by using sample estimates has not been investigated. As a 
result it is useful to obtain a lower bound on ¢; it can be shown that 

1 
€> ee 
This minimum value of ¢ is independent of the elements of the variance- 
covariance matrix. 

With this new correction to the degrees of freedom, the F, and F; statistics 
are now judged for significance by entering the tabulated F distribution 
with 1 and N — g degrees of freedom and with g — 1 and N — g degrees of 
freedom respectively. These tests are called conservative since the minimum 
value of ¢ gives the maximum reduction in degress of freedom. 


An Example 


Five groups of mothers, classified into their groups according to some 
external criteria, were given a maternal attitude questionnaire containing 
23 scales. For purposes of this illustration, six of these scales have been 
selected. Thus p = 6, g = 5, and N = 128. The group profiles and group 
means are given in Table 3. 

The five variance-covariance matrices were first tested for homogeneity. 
The likelihood ratio test, the multivariate analogue of Bartlett’s test for 


TABLE 3 


Mean Profiles for Five Groups of Mothers on Selected Scales of a 
Maternal Attitude Questionnaire* 

















No. of Scale Group 
Groups Mothers 1 3 6 9 13 14 Mean 

A 59 17.02 10.97 13.24 11.47 9.80 15.44 12.99 

B 13 17.92 13.85 17-23 14.00 12.23 17.38 15.44 

Cc 15 18.87 11.60 14.13 8.93 8.27 17.73 13.26 

D 32 16.75 14.47 15.41 11.78 9.91 15.94 14.04 

E 9 18.33 10.78 13.89 14.44 12.11 18.78 14.72 

All Groups 128 17.35 12.20 14.34 11.72 10.05 16.27 13.65 





¥*We are indebted to Dr. Richard Q. Bell of the Laboratory of Psychology, National 
Institute of Mental Health, for permitting us to use part of his data for this 


example. 














SAMUEL W. GREENHOUSE AND SEYMOUR GEISSER 103 


homogeneity of variances, can be found in Box [3, 4]. (Kullback [11] derives 
an equivalent test through information theory.) The test statistic is 


5 
M = N log. | S| — Dn; log, | S; | = 112.6565. 


In the above | S | is the determinant of the pooled variance-covariance 
matrix, and | S; | is the determinant of the sample variance-covariance 
matrix in the 7th group. Now compute 


2p? + 3p — 1 (5 1 t) " 
= — — —} = .17012 
Al tpt Dg — 1D \Pta as 





and 
fh = 2p(p + Ig — 1) = 84, 


and enter (1 — A,)M = 93.4 in the x’ distribution with 84 degrees of freedom. 
Since the probability of getting this value of x’ or larger is fairly high, the 
null hypothesis of equal variance-covariance matrices is not rejected. 

An estimate of the matrix = is given by the pooled variance-covariance 


matrix 

[ 3.100 101 —.279 —.083  —.009 1.557] 
101 5.780 1.018 —.114 —1.014 — .039 
—.279 1.013 5.560 1.039 1.366 —.169]. 
—.083  —.114 1.039 5.600 3.080  .258 
—.009 —1.014 1.366 3.080 6.820  .222 
| 1.557 039 —.169  .258 .222 5.170. 


Consider now whether the hypothesis of equal variances and equal 
covariances is consistent with S. The best estimate of the uniform variance- 
covariance matrix under this hypothesis is given by 











| 5.3888 AG7 +++  .467] 
ree 467 «5.888 --- 467 
| 467 467 --- 6338) 





where the diagonal element is an average of the 6 variances in S and the 
covariance is an average of the 15[4p(p — 1)] covariances in S. The reason 
for testing this hypothesis is that if S, is consistent with the data then classical 
analysis of variance procedures are applicable. The test used is again a 
likelihood ratio test, also given by Box [3, 4]. The test statistic is 








104 PSYCHOMETRIKA 


oe » 5 ee 10,806.42 _ 
o- —-" ~2tg | +o 


where (N — g) = 123 is the degrees of freedom entering into the computation 
of any element in S or S, . Now compute 


en p(p + 1)*(2p — 3) ~ 
= 8 — Np —- DO +p—H Oh 





and 
fi = (p’ +p a) 4)/2 = 19, 


and enter (1 — A,)M = 80.4 in the x’ tables with f; = 19 degrees of freedom. 
The probability of this result is well below .001; the hypothesis of equal 
variances and equal covariances must be rejected. 

The analysis of variance yields the numerical results of Table 4. 


TABLE 4 


Analysis of Variance 











Source a. ss M.S. F 
Tests 5 5092.56 
Groups 4 509.12 127.28 Fy = 16.51 
Individuals within Groups 123 948.41 7.71 
Groups x Tests 20 644.74 32.2h F, = 6.63 
Individuals x Tests 615 2991.04 4.86 


within Groups 





Of primary interest is the test of the homogeneity of group profiles, which 
is a test for the existence of the group-test interaction. For this purpose 
enter the F; value in the F table with (g — 1)(p — l)e and (N — g)(p — lL)e, 
or with 20¢ and 615«, degrees of freedom. From the previous formula, and 
the elements in the S matrix, ¢ is estimated to be .8194. Therefore the effective 
degrees of freedom are 16 and 503. The observed F; = 6.63 is greater than 
the .001 point for F with 15 and 120 degrees of freedom. One therefore rejects 
the hypothesis of no interaction and concludes that the mean profiles differ 
in shape from group to group. 

The conservative test, which of course does not require the computation 
of e, would enter F,; = 6.63 in the F tables with g — 1 = 4andN — g = 123 
degrees of freedom. The .001 point for F with these degrees of freedom is 
4.95. In this case, therefore, the conservative test yields the same conclusion 














SAMUEL W. GREENHOUSE AND SEYMOUR GEISSER 105 


as the approximate test, namely, the probability that the group profiles differ 
in shape due to chance is less than .001. 

The groups clearly differ with regard to levels as can be seen from the 
very large F, value. 


Other Procedures 


The foregoing procedures present approximate and conservative tests 
of significance resulting from the analysis of variance utilizing readily available 
tables of the F distribution. As mentioned earlier there are available exact 
procedures in the multivariate analysis of variance. These procedures lead 
to exact tests of the general hypothesis in multivariate analysis of the equality 
of vector means among g populations and of the existence of the group-item 
or group-test interaction of interest in profile analysis. However, all of these 
procedures require laborious computations involving the inversion of (p X p) 
matrices (p equal to the number of tests or items) and the computation of 
latent roots or the evaluation of determinants. A further complication is 
the lack of tabled probability values for the appropriate test statistics. 
Recently, however, distribution tables have appeared relating to the approach 
of multivariate analysis initially taken by Roy [13]. Under this view, the 
distribution of the test statistic is dependent upon the distribution of the 
maximum characteristic root of certain matrices. The most comprehensive 
tables or charts thus far available are those given by Heck [9]. Heck, inciden- 
tally, specifically considers the problem of profile analysis. 

The case for two groups will be developed in some detail to illustrate the 
principles involved and then the extension to g groups as given by Heck 
will be summarized briefly. The former situation leads to Hotelling’s general- 
ized T’ statistic and is implied in the literature on multivariate analysis. 

In the previous notation, x;;, is an observation on item j for individual 
7 in group k, and #.;, is the mean of character j in group k. The range of 
subscripts here isk = 1, 2;7 = 1,2, ---, pjandz = 1, 2, --- , m . As before, 
assume that the random vector 2/4) = (iz, *** » Line) IS N(uay , 2), that 
is, the p variables have a multivariate normal distribution in population k 
with mean vector u/,, = (ue, °** » px) and variance-covariance matrix 
2 which is common to the g populations. The hypothesis to be tested for 
g = 2is 

fir <=" fae ae Bt ae 


Transform the p variates in x to p — 1 variates in y as follows (see [1], 
pp. 110-112 and [12], pp. 239-244): 








106 PSYCHOMETRIKA 


such that >-?_, c,, = 0. The matrix C, subject to the restriction, can be 
perfectly arbitrary. For example, 








[1 -1 0O- 
ree a, ae 0 
L1 0 O -+» —1 
subtracts x. , --+ , x, from the first variate resulting in y, = 2, — 22, Y2 = 
fi — Me, °°? He- = % — z, . Or, 
lp—-1 -1 1 es) =-1 =1] 
eet ae he ee 
Pp . . 
_—a bt Sf «3 eee Se 
which in effect subtracts from each of the p variates their mean = (1/p) 
>7-1 2; resulting in y; = 2, — @, +++ , Yp-1 = 2p-1 — &. Using the first 
transformation above, the vector y{,. = (Yu ,-°** » Y@-1e) is multivariate 
normal with mean 9{) = (ms *** ; M-1e)) Nik = Mak — MG+1)e , aNd Variance- 


covariance matrix C2C’, where the prime denotes the transpose of a matrix. 
After transforming the p z-variates into the p — 1 y-variates for each 
of the n = n, + nz individuals, the group means in the y’s are 


Y.11 »Y.21 pe tae » Y.@-1)1 
Y.r2 > Y.22 5 °°* » Y.@-102 » 


and the pooled sample variance-covariance matrix in the y’s, W = [w,,], 
where 


1 n1 " Be 
~_” Ny + ag i 2 » (Yor a 9.-)Yin ™” 


+ > (Yiro — G12) (Yios ae 7.4)} 5) 


and r,s = 1, 2,--- , p — 1. It is easily seen that the null hypothesis in the 
x’s is equivalent to the following hypothesis in the y’s: 


Ni = Mi ~ MG4+1 = Nig = Bie — Koi41)2 » j ead 1, 2, ee (p we 1); 


i.€., 2a) = 1) - But this is the general hypothesis of multivariate analysis 
of the equality of mean vectors for two groups and it is well known that the 
appropriate statistic to test this hypothesis is T’. Therefore 














SAMUEL W. GREENHOUSE AND SEYMOUR GEISSER 107 


TT’? = 





re i Gay — Gm)'W Gay — Fo») 


_MN2 - 2 w “9. a — §.2)(G.1 ay Cid 


aes + Nez e 


where w”™ is element 7s in the inverse matrix W~'. This statistic has the T? 
distribution with n, + n. — 2 degrees of freedom. 
To test the hypothesis at level a, enter 


T(n, +n — p) 
(nm. + 12 — 2)(p — 1) 


in the F table with p — 1 and n, + n. — p degrees of freedom. If 


T’(n, + m2 — p) 
(ny +. — 2)(p— 1 


reject the hypothesis; otherwise accept. 

The general case for g populations, of which the above is a special case, 
is given by Heck [9]. The extension is obvious. From the g by p — 1 table 
of group means, one computes the between groups sums of squares and cross 
products to obtain the elements of the matrix B, say. Thus element rs of 
this matrix is 








billed aa 1, + — p) 


b. = ds (Gr — 9§..)G.0%—- 9...) = De maf). —n§..9.s. ; 


where r, s = 1, 2, --+ , (p — 1). For the error matrix W, compute similarly 
the sums of products, so that 


-> . (Yirk — G.re(Yisr — Ger) = » a YireXYiak — Do MG). 04G.0 . 
In the above formula, 7.,, = n;' : Yiex . The various test statistics 
proposed are proportional to some function of the product matrix BW™’. 

In the literature on multivariate analysis, there have been three ap- 
proaches to the distribution problem. Wilks [17], starting with the likelihood 
ratio criterion, derived the test statistic | J + BW~* |~', which is obviously 
equal to the inverse of the product of the characteristic roots of (I + BW~’), 
I being the identity matrix. Hotelling [10] has proposed the distribution of 
tr BW~* or of the sum of the characteristic roots of BW~*. Roy [13] has’ 
proposed the consideration of the distribution of the maximum characteristic 
root of BW~*. For a further discussion of these three points of view consult 
Anderson ({1], pp. 221-224). There are no probability tables available for 
the first two test statistics although the exact cumulative distribution of the 
determinantal statistic is given by an infinite series of x’’s, the first term of 
which, for any reasonable N, gives an excellent approximation to the whole 








108 PSYCHOMETRIKA 


series and is quite easy to compute ({1], p. 208 and [12], p. 261). Several 
tables are available giving critical points of the distribution of the maximum 
characteristic root, the most extensive to date being due to Heck [9]. 

As an illustration of the T’ procedure, and also to compare the F tests 
with the 7’ test, the two smallest groups, B and E, are selected for testing 
the quality of profile shapes. The mean profiles on the six items are given 
in Table 3. The analysis of variance performed on these two groups only is 
displayed in Table 5. 


TABLE 5 


Analysis of Variance 











Source d.f. ss M.S. F 
Tests 5 733.03 
Groups 1 16.25 16.25 
Individuals within Groups 20 111.84 5.59 
Groups x Tests 5 105.56 ert F. = 4.8h 
Individuals x Tests 100 435.58 4.36 


within Groups 





To test the hypothesis of no group-test interaction, that is, of equality 
of the profile shapes, consider F; , distributed as F(5e, 100e). From the pooled 
sample variance-covariance matrix, an estimate for ¢ is .6727. Therefore, 
for the approximate test, enter F; = 4.84 in the F distribution with 3 and 
67 degrees of freedom. This test yields 


005 > P[F(3, 67) > 4.84] > .001, 
with 
P[F(3, 67) > 4.84] ~ .004. 


The minimum value ¢ can assume is 1/5. Therefore, for the conservative 
test, enter F, in the F distribution with 1 and 20 degrees of freedom. This 
test yields 


05 > P[F(1, 20) > 4.84] > .025, 
with 
P[F(1, 20) > 4.84] ~ .04. 


As previously indicated, to carry out the exact test it is necessary to 














SAMUEL W. GREENHOUSE AND SEYMOUR GEISSER 109 


reduce the dimensionality of the vector from p to p — 1. This is accomplished 
by subtracting x2 , %3 , % , Zs , and zx, from x, , obtaining a 5-dimensional 
vector y on each of the 22 individuals. The mean vector for group B is 


Ge = (4.077, .692, 3.923, 5.692, .538) 

and for group £ is 

G2 = (7.556, 4.444, 3.889, 6.222, —.444). 
The pooled sample variance-covariance matrix in the y’s is 
[5.057 2.854 382 560 2.384) 
7.250 .906 3.994 2.196 
W= , . 5.090 5.696 1.155 
12.716 3.002 
te. . . . 5.272] 
with n, + n. — 2 = 20 degrees of freedom. Therefore, 








T? = —“™_ (9, — Gs)'W (Gs — Gs) 


Mm +N, 


= 27.048. 





Now enter 


Yo mM + No sa, P Ls 27.048 16 





= 4.328 


(p = 6, the number of x variables) in the F distribution with p — 1 = 5 
and n, + n2. — p = 16 degrees of freedom. This exact test yields 


025 > P[F(5, 16) > 4.328] > .01, 
with 
P[F(5, 16) > 4.328] ~ .01. 


By presenting these three tests, no more is implied than that all three 
tests yield significant results at the .05 level. It is not appropriate to compare 
the approximate and conservative tests with the exact test based on the 
distribution of any of the functions of the latent roots previously discussed, 
all of which reduce to Hotelling’s T” in the case of two groups. The reason 
is that there are two exact tests involved; one based on a function of the 
latent roots, the second based on the distribution of the ratio of linear combin- 
ations of x’ variables. It should be emphasized that the first two tests given 
in the example are approximate and conservative for the latter. The conserva- 








110 PSYCHOMETRIKA 


tive test provides a procedure which is more than “rough and ready” and 
yet saves considerable time since it does not require a matrix inversion nor 
even the computation of a covariance matrix. This is particularly true when 
p, the number of variables, is large and the number of samples is greater 
than two. 

The question of electronic computers is another matter. Given the 
availability of a classical analysis of variance program and the availability 
of a combined program to carry out the multivariate analysis of variance 
involving the between samples variance-covariance matrix, the inverse 
of the error variance-covariance matrix, and the extraction of the maximum 
latent root of the product of the two matrices, it is very likely that the former 
would require less machine time. However, the difference is probably of no 
practical importance and the exact procedure should be used. 

A more fundamental question relates to a comparison of the two exact 
tests involved. Are the multivariate analysis of variance procedures depending 
upon the distribution of BW~* more powerful against all alternatives than 
the distribution of the ratio of linear sums of x’ variates? It is not clear that 
this is so, particularly with regard to the analysis of profile shapes where 
the former procedures must reduce the dimensionality of the random vector. 

If one does decide to use the F tests in an analysis, the following series 
of steps are suggested. After finding the traditional analysis of variance 
table, first test the appropriate observed F value in the F distribution with 
full, ie., unreduced, degrees of freedom. For F; , for example, this would 
be F with (p — 1)(g — 1) and (p — 1)(N — g) degrees of freedom. If F; 
is smaller than the a critical point, one can stop here, for the null hypothesis 
will not be rejected with further manipulation of degrees of freedom. If 
the observed F is significant, then one proceeds to the conservative test 
where the degrees of freedom are reduced by a factor equal to 1/(p — 1). 
For F, , the appropriate F distribution is F(g — 1, N — g). If this test leads 
to significance at the a level, one can at this point reject the null hypothesis 
without further testing. However if the conservative test is not significant 
then it is suggested that the e be estimated from the variance-covariance 
matrix and the approximate test be carried out. 


Number of Individuals Less than the Number of Variables 


As indicated in the introduction, in the case of one group, if (n — 1) < p, 
or in the case of g groups, if (NV — g) < p, it is not possible to apply multi- 
variate procedures. The reason of course is that the error matrix, W, is 
singular. Such situations are not too uncommon, especially in research in 
clinical psychology and psychiatry. Clearly the approximate F tests presented 
are not applicable either since the reduction in degrees of freedom is dependent 
upon the elements of a singular matrix. However, the conservative test 
can be applied. 














SAMUEL W. GREENHOUSE AND SEYMOUR GEISSER 111 


Unequal Variance-Covariance Matrices 


Perhaps one of the most important uses of the conservative test is 
in the situation where one cannot assume the equality of the unknown 
variance-covariance matrices in the p-variate normal populations being 
sampled. For this case, there are no exact procedures available. It will be 
noted that this case, p = 1 and g = 2, reduces to the Fisher-Behrens problem. 

Here, in order for the F statistics to be unbiased, it is necessary to work 
with equal sample sizes in the groups, i.e.,n; = --- = n, = n. Therefore, 
N = Yn, = gn. It can again be shown that the respective numerator and 
denominator quadratic forms entering into F, , F, , and F are independent and 
have the same expectations. Now, however, when an F distribution is used 
to approximate these F statistics (see [5], theorem 6.1), it turns out that 
there are different factors reducing the numerator and denominator degrees 
of freedom, and these in turn differ for the three F statistics. Here again it 
can be shown that these ¢’s have lower limits which when applied to the 
appropriate degrees of freedom result in a conservative test for assessing the 
significance of F, , F, , and F; by entering these in the F distribution with 1 
and n — 1 degrees of freedom. It is of interest that the F, test, when p = 1 
and g = 2, is a conservative test for the various approximate solutions given 
to the Fisher-Behrens problem of testing the equality of two means with 
unequal variances (e.g., [15], p. 295). 


REFERENCES 


{1] Anderson, T. W. Introduction to multivariate statistical analysis. New York: Wiley, 1958. 

[2] Block, J., Levine, L., and McNemar, Q. Testing for the existence of psychometric 
patterns. J. abnorm. soc. Psychol., 1951, 46, 356-359. 

[3] Box, G. E. P. A general distribution theory for a class of likelihood criteria. Biomet- 
rika, 1949, 36, 317-346. 

[4] Box, G. E. P. Problems in the analysis of growth and wear curves. Biometrics, 1950, 
6, 362-389. 

[5] Box, G. E. P. Some theorems on quadratic: forms applied in the study of analysis of 
variance problems: I. Effect of inequality of variance in the one-way classification. 
Ann. math, Statist., 1954, 25, 290-302. 

[6] Box, G. E. P. Some theorems on quadratic forms applied in the study of analysis of 
variance problems: II. Effects of inequality of variance and of correlation between 
errors in the two-way classification. Ann. math. Statist., 1954, 25, 484~498. 

[7] Eisenhart, C. The assumptions underlying the analysis of variance. Biometrics, 1947, 
3, 1-21. 

[8] Geisser, S. and Greenhouse, 8. W. An extension of Box’s results on the use of the F 
distribution in multivariate analysis. Ann. math. Statist., 1958, 29, 885-891. 

[9] Heck, D. L. Some uses of the distribution of the largest root in multivariate analysis. 
Inst. Statist. Univ. North Carolina, Mimeo. Ser. No. 194, 1958. 

[10] Hotelling, H. A generalized 7' test and measure of multivariate dispersion. Proceedings 
of the second Berkeley symposium on mathematical statistics and probability. Berkeley: 
Univ. Calif, Press, 1951, 23-42. 








112 PSYCHOMETRIKA 


[11] Kullback, 8. An application of information theory to multivariate analysis, II. Ann. 
math, Statist., 1956, 27, 122-146. 

[12] Rao, C. R. Advanced statistical methods in biometric research. New York: Wiley, 1952. 

[13] Roy, S. N. On a heuristic method of test construction and its use in multivariate 
analysis. Ann. math. Statist., 1953, 24, 220-238. 

[14] Scheffé, H. A “mixed model” for the analysis of variance. Ann. math. Statist., 1956, 
27, 23-36. 

[15] Welch, B. L. Note on Mrs. Aspin’s Tables and on certain approximations to the 
tabled functions. Biometrika, 1949, 36, 293-296. 

[16] Wilk, M. B. and Kempthorne, O. Fixed, mixed, and random models. J. Amer. statist. 
Ass., 1955, 50, 1144-1167. 

[17] Wilks, S. S. Certain generalizations in the analysis of variance. Biometrika, 1932, 24, 
471-494. 


Manuscript received 8/21/58 


Revised manuscript received 12/1/58 























PSYCHOMETRIKA—VOL. 24, No. 2 
JUNE, 1959 


DOMAIN SAMPLING FORMULATION OF CLUSTER AND 
FACTOR ANALYSIS 


RosBert C. TRYoN 


UNIVERSITY OF CALIFORNIA, BERKELEY 


Domain sampling principles permit formulation of a general method 
of cislihdiniendiaaeas analysis. Cluster and factor analysis methods are special 
cases stemming from decisions made at different stages of the general method, 
especially in defining an independent dimension. Key cluster analyses define 
a dimension as a selection of s variables drawn from the full n set. Centroid, 
principal axes, and maximum likelihood analyses define it by the n variables 
(raw or residual, aps ge or unweighted); bifactor and second-order analy- 
sis, by both types of selection; square root analysis, by one variable. Key 
cluster methods can be designed to test hypotheses. 


The definition of a variable as the sum of a sample set of scored responses 
(e.g., to test items) selected to be representative of a defined domain of 
behavior is a basic principle of psychometrics. This standard practice may 
be expressed in a simple algebraic fashion which leads to an integration of 
the plethora of formulations of the reliability coefficient [39]. When a test is 
included among n variables, domain sampling algebra also provides a definitive 
solution of its communality [40]. These principles have been shown to imple- 
ment the broad logic of multidimensional analysis by the psychometric 
procedures called cluster analysis [42]. The most generally applicable compu- 
tational variant of cluster analysis, the CC method, has also recently been 
published [41]. 

Completing this group of papers on domain sampling formulations, this 
article has as its purpose, first, to state the general case of multidimensional 
analysis, and, second, to develop from it important special cases that are 
variant methods of cluster analysis. Some of these special forms have, how- 
ever, been otherwise known over the last half century as factor analysis 
methods, their main originators being Spearman [29, 30], Thomson [81], 
Burt [2, 3], Kelley [19, 20], Hotelling [14, 15], Thurstone (32, 34], Holzinger 
[13], and Lawley [21]. The factor methods of Spearman, Kelley, Thurstone, 
and Holzinger are conceived as issuing from the basic factor theorem. The 
assumptions are that a test score results from underlying, uncorrelated and 
additive true (general and multiple), specific, and error factors. These 
restrictive assumptions of factor theory are difficult to justify on substantive 
biological and psychological grounds [36]. This paper shows that when the 
factor methods are recast as variants of cluster analysis such assumptions 
about the components of a test score are unnecessary restrictions. 


113 








114 


PSYCHOMETRIKA 


TABLE | 
DECISIONS THAT DISTINGUISH VARIOUS METHODS OF MULTIDIMENSIONAL ANALYSIS 















































PIVOT |GEN'L 
KEY CLUSTER hoe v GENERAL CLUSTER 
REGIONS : 
oF EERE SRS ws i= | % 
Resid Cluster} Precluster Factor Analysis Variants 
DECISION Preclus|Ration Prin- | Maxi 
Tot | Cum Square| ,. | Cent- | Tin | Maxt- 
Cum | Cum Bi- ; cipal | mum 
Com | Com Root roid ‘ 
(Tc) | (cc) Com | Com (Pv) factor (Unwat) Axes | Like- 
(PCC)? | (RCC) (Wat) |lihood 
PRELIM DECISIONS 
a Reflection Yes Yes | Yes | Yes | Yes | Yes Yes No No 
Factor No No Yes No No 
b Preciuster No No Empiric Ration} Yes No No No No 
Factor -| Wo Yes No No No 
DIAGONAL ENTRIES 
c Initial Com- Simul 
munalities | Quad |Approx Approx’Approx” Approx |Approx|Approx |Approx| No 
Factor Unities| No |\Highr \Unities| No 
DIMENSIONALITY 
d Defining Var- n and 
iables of C, S, Si 5, Ss, |3onl] s, n_ |{n(wgt)|n(wgt) 
Factor 1 |kands,| 1 |alwgt) | n(wgt) 
e Partial Com- 
munalities |Sim £|Sim =|] Sim=|]Simz|/Sim =] Sim=|Sim=]} Iter | Iter 
Factor SimE*| ApprB\ Simz=\ iter | Iter 
f Terminating 
Criteria T T. T f T x T T  |Residr 
Foctor Salient| k+/ \Residr\Residr|Residr 
g Reiterate 
Factoring No Yes | Yes | Yes | Yes | Yes | Yes | Yes |Progk 
Factor No No No No |Progk 
STRUCTURE 
h Oblique From | From 
Analysis |From d|From d} b,d | b,d |FromdiFrom djRotate/Rotate|Rotate 
Factor No No |\fFotate| No No 
SCORING 
i Oblique Dimen- 
sion Scores a = = z= = =  Regress Regr | Regr 
Factor £ Regress Regr® | Regr | Regr 



































*For factor analysis methods (e.g.,"multiple group"), see text 

In abridged form of the method, no initial communalities are necessary 
© Specified for the special case of s,=] and communalities unity 

Sampling criteria may be used in place of or in addition to the T criterion 
* Regression in theory; cluster score by = in practice 

















ROBERT C. TRYON 115 


The broad plan of this treatment is represented schematically in Table 
1. The general case of multidimensional analysis consists of progressive 
stages, or regions, in each of which decisions by the analyst are required. 
These regions are summarily described in the lettered rows a to 7, extreme 
left column of Table 1. The numbered columns 1 to 9 are the special cases, 
each defined by the particular pattern of decisions listed down its column. 
The first four cases form the Key Cluster methods, columns 1 to 4. The 
remaining groups are other cluster analysis variants known also as factor 
analysis methods (or schools): their descriptive names are given below the 
column numbers. For example, column 7 defines the Thurstone centroid- 
simple structure method. In this group of methods, note that in each region of 
decision (or row) there are two entries. The first entry is a decision resting 
on the conception of the method as a variant of cluster analysis. The second, 
in italics and labelled at the left as “‘Factor,” is the decision made in orthodox 
factor formulation. For example, down column 7 Thurstone’s decisions are 
represented by the pattern of second, italicized, entries. 

The general case of multidimensional analysis is given in the next section 
of the paper. Each region of analysis is taken up in order, the principles 
involved being illustrated by referring to some of the types of decisions listed 
along its row. In later sections the special cases will be taken up successively, 
that is, for each column of Table 1 the nature and rationale of its pattern 
of final decisions will be given, first, conceiving the method as a variant of 
cluster analysis (first cell entries), and, second, as an orthodox factor formu- 
lation (second cell entries). 


General Case of Multidimensional Analysis 


The basic data of the analysis are the intercorrelations between scores 
on variables X, , X2,--:,X,,°°*, Xn. The over-all objective is to determine 
and measure the smallest number, k, of dimensions that reproduce the inter- 
correlations, entered as side elements in a correlation matrix. The successive 
stages of the analysis, rows a to 7 of Table 1, achieve this objective. For 
convenience, these stages are grouped under five subordinate objectives, 
lettered A to E below. 


A. Preliminary Decisions (Table 1, rows a, b) 


Reflection of Variables (row a). A main desideratum in deciding on 
whether to reflect variables is the method of computing partial communalities 
(squared factor loadings), row e. In the special cases of columns 1 to 7, where 
the simple summation (Sim 2) formula (to be developed later) is used, 
reflection of variables, denoted by ‘‘Yes’’ in row a, is required. 

Preclustering Variables (row b). Preclustering variables before dimensional 
analysis permits certain abbreviated or rational variants of multidimensional 
analysis (columns 3, 4, 5). In the other variants test clusters are located 








116 PSYCHOMETRIKA 


during the dimensional analysis (1, 2, 6) or by rotation of dimensions (7, 
8, 9). 


B. Determining Initial Communalities (Table 1, row c) 


The diagonal entries of the matrix are “‘self-correlations.’”? Communalities 
are selected as diagonal elements because their use yields the smallest number, 
k, of necessary and sufficient dimensions, or uncorrelated cluster domain 
scores, that will reproduce the intercorrelations. If one used the reliability 
coefficient, r,,, ,.of each variable as its diagonal element, more dimensions 
than k would be required, and if one set each diagonal value at unity, even 
more dimensions would be necessary. 

As a correlation coefficient, the communality h? of a variable v is defined 
as the correlation between the observed variable X, and a hypothetical 
construct variable X,, measuring a different behavior property than X, but 
having correlations across the n — 1 other variables equal to those of X, 
((40], formula 1) i.e., 


(1) W=r,,. 


As a variance, the fundamental definition of h? is the proportion of total 
variance of the observed scores of v that is predictable from the construct 
variable-domain score, C’, , defined as 


(2) CL, =2,+2-+ °°: +2,., 


in which the observed z,-scores are defined as one sample variable drawn 
from an infinite set of construct variables, all members of which measure 
different behavior properties, but whose correlations with the n — 1 variables 
are proportional to those of v. From this construction it follows that ((40], 
formula 5) 


(3) he = rece . 


Proportionality of the correlations of » with those of another variable 7 means 
specifically 


(4) r,;/Ts; = & constant G=1,°-- ,njvu Ki Xj). 
Taking the following as an index of proportionality ({41], formula 6; [43]), 
(5) Py = rr, Gj =1, »n3;v #t J), 


then under condition (4), P?; is unity. The definition of the variable-domain 
C, in (2) is not restricted to a domain of variables with equal correlations 
but merely to those with proportional r’s, as defined in (4). 

Communalities are int practice not computed by their defining formulas 
(1) and (3) because the requisite construct variables are not available. In 














ROBERT C. TRYON 117 


Table 1, row c, note that in TC analysis (to be described later) a solution is 
attempted by a quadratic equation. But in remaining methods approximations 
are taken; after reiteration of the factoring procedure, row g, final converged 
values are achieved. 


C. Determining Dimensionality (Factoring) (Table 1, rows d, e, f, g) 


The object is to determine the value of k, the number of uncorrelated 
composite variables or independent dimensions, C; , C2, --- ,C.,-°--: , Cy 
that could reproduce the correlation matrix, including diagonal communali- 
ties. 

(a) The communality, h? , of any variable v may be partitioned into 
k partial communalities (squared factor loadings) or h?, values as follows: 


6) ey eee ae ee 


(b) The correlation coefficient of v with each other variable 7 is repro- 
duced, i.e., the observed r,; equals 


(7) rs sd hihi; + Soa + a + as + hy, he; ’ 


or, said another way, each of its residual correlations after removing the 
variance from dimensions | to k is 


(8) 1... =H Ms — TH = 0. 


Definition of an Independent Dimension, C, (An Orthogonal Factor) 
(Table 1, row d). The score C, is a composite, defined as the following in- 
dependent cluster domain score: 


(9) C. = (w.)Ca + (wr) Ce + +++ + (w,,),C,, 


The defining variables of the dimension are selected observed variables a, b, 
+++ , 8,, taken from all n variables. The C’s are their variable-domain scores, 
defined by (2). The prescript r means that scores on preceding dimensions 
C,,°*: , C,-, are held constant, thus establishing the independence of C, . 
The w’s are weights. The same variable may appear in different dimensions, 
though, of course, as different residual scores. 

This definition is more general than that written originally by Pearson 
[25], who initiated multidimensional analysis. It encompasses as special 
cases the varieties of cluster and factor analysis given in Table 1. In row d, 
note that in key cluster analysis (TC, CC, PCC, RCC), each dimension is 
defined by a cluster of s, variables usually less than n. In centroid, principal 
axes, and maximum likelihood factor analysis, the defining variables are 
indiscriminately a general cluster of all n variables. These latter methods 
differ from each other in the values of the weights attached to the different 
component variables in (9). In bifactor analysis, dimension C, is a general 
cluster of all n variables, but later dimensions are key clusters. In pivot 








118 PSYCHOMETRIKA 


variable (‘square root”) analysis, each dimension is defined focally by one 
variable only. 

Recall that each C-variable is a variable-domain defined in (2) as a 
composite of z-scores of variables measuring different behaviors but showing 
proportional correlations, hence the use of communalities in the diagonals. 
Were the z-scores in (2) defined as different test samples of the same behavior 
domain, each C-variable would be the construct composite called the ‘“‘true”’ 
or X,.-score of the variable [39] and reliability coefficients would be diagonal 
entries of the matrix. Were each C-variable defined as the single z-score of 
the defining variable, then the diagonals would be unities. In multidimensional 
analysis, the definition leading to communalities in the diagonal is chosen 
for the reason given under objective B. 

The simplest weighting of the C-variables is to set all w’s to unity in 
(9), as in TC, CC, PCC, RCC, centroid, diagonal, and bifactor analysis. 
This simple summation, as Burt calls it [3, 4], intrinsically weights each 
defining variable of the dimension C, by its proportional contribution to the 
variance of C, , that is, by the sum of its communality and its correlations 
with the other defining variables. Another choice, characteristic of the 
principal axes and maximum likelihood methods, is the computationally 
arduous least squares solution of the sets of weights which yield the maximal 
sum of their h?2 values, that is, of their partial communalities._ 

Partial Communalities (Squared Factor Loadings) (Table 1, row e). 
The portion of the variance of a variable v predictable from a dimension C, 
is the’square of its correlation with C, , called its partial communality, hi, , 
defined in (6). In the unweighted case, from the correlation of sums in the 
limit, 

(3,1,4)° 


AO Ia a A th 


(10) hi, = tre. = 
The numerator is simply the square of the sum of the residual correlations 
of v with the defining variables of C, . The denominator is simply the sum 
over the total submatrix of residual correlations of these defining variables, 
including the diagonal residual communalities. When 7 = », then ,h; is included 
in the numerator. 

A residual correlation, from (7) and (8), is 


(11) Mot = 1.0.2 = Mt — (aay Het + A-1 Ae-vd); 
and similarly for the ,r;; terms. A residual communality is, from (6), 
(12) rh; — 1... — h; aie (hi, ses + hie-1).)- 


If one has chosen the defining variables of C, before the dimension 
analysis as in PCC and RCC analysis, it is not necessary to work out the 
individual residual terms of (11) and (12). Only sums from the raw matrix 














ROBERT C. TRYON 119 


plus diagonals and sums of partial communalities on prior dimensions are 
necessary. Thus, from (11) the numerator of (10) is 


(13) Leos = Us — (aD, Hees HA eet). Zhye-1),) 
(i = a,-++ ,8,). 
When 7 = 2, h? is included in 2r,; . The denominator of (10), 
(14) Lhe + Qi; = Lrg — (Za)? + + + (She-1),)’] 
(i, j=a,-++ ,8). 


2r;; is the sum over the raw matrix of s, variables including diagonals. 

This simple general formula (10), called “Sim 2” in Table 1, row e, is 
used by all cluster and factor analysis methods, excepting principal axes and 
maximum likelihood (to be considered later). Recall that for different methods, 
it differs only in the value of s, , e.g., in centroid analysis, s, = n. 

Terminating Criteria (Salient Dimension Analysis) (Table 1, row f). 
As a simple rational standard for terminating factoring, the writer proposes 
the communality exhaustion criterion. To end factoring by this criterion, one 
estimates the communalities of all the variables at the beginning of the 
analysis. Factoring then proceeds up to the dimension C, at which the com- 
munalities of the n variables are exhausted. 

Recall that the communality h? is basically defined in (3), quite in- 
dependent of the dimension analysis, as the variance of variable v predictable 
from its variable-domain C’, . This magnitude is estimated before factoring is 
undertaken by computing formulas given later in the paper, (31) or (32). 
After factoring is under way, the variance of v predictable from dimensions 
C, , C,, ++: , C, is represented by h?, , this magnitude being the sum of 
partial communalities of v up to and including h?, , as shown in (6). 

Writing the ratio of these two variances, i.e., 

2 
(15) Fr. = iF, 
factoring may be terminated on that dimension C, at which the numerator 
of (15) approaches the denominator, i.e., when 
2 


(16) F,, = ie = 1.000. 

To evaluate F for each variable on each dimension as factoring proceeds 
would, however, be a complex procedure. Furthermore, the magnitude of F 
for a given variable is subject to substantial error, both in the initial approxi- 
mation to the denominator term and in the initial dimensional estimate of 
the numerator. Less subject to these errors is the approximate sum (or 
average) of the F-values over all n variables, namely, 








120 PSYCHOMETRIKA 


o 2. 





Using T, as a criterion, one stops factoring on the dimension C, for which 
T, by (17) first equals or exceeds unity, i.e., where 


2 
(18) T, = at S 1.000. 





In practice, the T-criterion in (18) appears to yield the most salient 
dimensions [41]. Consider bias in 7: initial estimates of communalities are 
usually biased downward, suggesting that a salient terminal dimension 
might be rejected by 7. This is unlikely because such a dimension would be 
the one for which 7’ first exceeds 1.000. The effect of an upward bias would 
be to accept nonsalient dimensions. To minimize such an effect, the analyst 
may set the criterion a little under unity, say, at .975. 

The T-criterion may reject later dimensions that would be accepted on 
sampling grounds. Many analysts may consider such rejection to be an 
unimportant loss, for such dimensions contribute minor general variance 
and are usually difficult to interpret. 

Significant Dimension Analysis. On the basis of sampling criteria, one 
may wish, however, to accept all significant (as distinguished from salient) 
dimensions. The orthodox F-test procedures applied to the communality 
exhaustion indices represented by (15) and (17) would seem appropriate, 
but their sampling characteristics are not yet known. There remain the various 
significance tests applied by factorists to the distribution of residual correla- 
tions (11) or to the distribution of the square roots of the partial communalities 
(i.e., of the factor loadings) of a given dimension (10). The tests developed by 
Saunders are of special interest because he proposes both types ({6], formulas 
44 and 46, p. 300ff). 

Reiteration of the Factoring to Converged Communalities (Table 1, row g). 
After the first factoring is complete, the sum of partial communalities by (6) 
may not yield the correct value of the communality of each variable, as in 
artificial or population matrices [40]. In those methods that start with approxi- 
mations, reiteration of the dimensionality analysis on the k dimensions is 
required until convergence is secured, as shown in row g, Table 1. 

The decision as to which decimal place will define convergence may be 
made on arbitrary grounds of salience, say, the third place. On sampling 
grounds, however, one may terminate convergence when for every variable 
the difference between two successive iterated values of its communality by 
(6) becomes less than, say, a third of the standard error of the last iterated 
value of its communality. Treating h, as a multiple correlation, as in (38), 
later, the approximate error is 


(19) om, = (1 —h)/VN — (k +1) = (1 —h)/VN — 2. 











ROBERT C. TRYON 121 


The magnitude of k, at least 1, is usually trivial relative to N, hence leading 
to the final approximation shown. 


D. Determining the Structure of the Interrelationships (Table 1, row h) 


Having determined the dimensionality of the intercorrelations, one may 
relax the condition of independence and select or derive the k dimensions 
that may be oblique to each other and be better defined by the observed 
variables. Key cluster analysis routinely locates those groups of variables 
which delineate the k most nearly independent oblique dimensions. Corre- 
sponding to the independent dimensions C, , C, , --- , C, there are the 
matched set of oblique dimensions, respectively, C; , Cir , --: , Cx . Thus 
for independent dimension C’, given in (9) there is an oblique dimension C, 
defined in simple summation form by the domain score 


(20) C*,=C.+C,+---+C,,. 


Scores on the C-variables that form this composite have the same definition 
as in (9), but they are not residual scores as in (9). Geometrically, dimension 
C, is an oblique subcentroid in k-space. 

If the analyst wishes to check on the clusterings indicated by the 
dimensional analysis with an eye, perhaps, to a possible reclustering of the 
variables including those that had remained unclustered, he may employ a 
geometric model (for an illustration, see [41], Fig. 1). This model takes the k 
dimensions as independent axes, and each variable as a point on them. The 
coordinate of each variable on any axis C, is its correlation, r,¢, (unaugmented 
factor loading), which by (10) is 


(21) Tos = he, . 


The resulting interior model of variable-points is perceptually not as 
descriptive of the interrelationships among them as the surface model, given 
by plotting each variable-domain C,, by its augmented correlation, this being, 
from the correlation of sums in the limit, 


(22) To.cs = hz,/h, . 


The surface model, in which all variable-domains are points at distance 1.00 
from the origin, has the perceptual merit of revealing directly as a surface 
separation of points the relationship between each variable-domain C’, and 
any one of the other variable-domains C; . This important “common factor 
correlation” can, however, be computed directly from the matrix and the 
communalities, i.e., from the correlation of sums in the limit, 


(23) Toc: = r,:/h,h; . 


Simple Cluster Structure (Rotated Primary Factors). The finally selected 
cluster domains of type (20) are the most nearly independent k dimensions 








122 PSYCHOMETRIKA 


evident in the data. The degree of their interdependence is given by their 
intercorrelations. For any two such generally oblique cluster domains, C,, , 
C,, , their correlation is given, from the correlation of sums in the limit, as 


Zrii 
~ rents are VE’ 
ii ii 


where r;; is the sum of r’s in the submatrix of the s; variables of C,, including 
diagonal communalities, 2r;; is the same for the submatrix of s; variables 
defining C,, , and =r;; is the sum of the s,s; coefficients in their cross correla- 
tion submatrix. Recall that, as stated under (9), a given variable may appear 
in more than one oblique dimension, a situation which would, of course, 
increase obliqueness. 

As an aid in interpreting an oblique dimension C,, , one may compute 
the correlation of each known observed sample variable v with the dimension 
(rotated factor loading). By the correlation of sums in the limit it is 


_ Bios 


(25) Sets," V5r.; 


But of more interest theoretically is the correlation of C,, with each kind of 
general variation of which v is taken as a test sumple, namely, its variable- 
domain C, in (2). This correlation (augmented rotated factor loading) is 
simply, from the correlation of sums in the limit, 











(26) TCsls5 = T.c,,/Ns . 


Difficult problems arise in simple cluster structure analysis in those 
methods in which the defining variables of the independent C, dimensions 
are total clusters of all m variables. As shown in Table 1, row h, columns 7, 
8, 9, these dimensions must be rotated [see 9] to meaningful defining orthogo- 
nal or oblique clusters. Orthodox factor analysts following Thurstone [34] 
propose graphical rotation—a cumbersome, subjective procedure, admittedly 
an art [6]. Recent attempts have been made to achieve rotation by objective 
analytic methods [5, 18, 24, 26, 28]. Rotation, an unnecessary burden, is not 
required when the dimensions are defined by key clusters. 


E. Scores on Oblique Dimensions (Table 1, row 1) 


Ideally, the best estimate of an individual’s sample score C,, on any 
dimension C, is the regression of C, on the n variables, i.e., 


(27) Cas = 2B eo, #: (a = At —e ,n). 


Such an estimate is so arduous to compute that in most factor analyses the 
important job of measuring persons on the reduced dimensions is rarely 
tackled, and a main benefit of the analysis is lost. 











ROBERT C. TRYON 123 


When, however, a dimension is defined by a key cluster, a good estimate 
may be secured from the cluster score, namely, the simple sum of the z scores 
of the defining variables of C, , i.e., 


(28) Ca = fe +++ +H. 


In this composite each defining variable takes an intrinsic weight proportional 
to the sum (or mean) of its correlations with the remaining defining variables. 
In Table 1, row 7, the simple summation, labelled “‘2’’ may be used in all but 
the general cluster methods (columns 7, 8, 9). 

The cluster domain validity of the observed cluster score (28) is its corre- 
lation with the full domain score, C, in (20). By the sums formula in the 
limit it is 


[zhi + 2r;; oR ae - 
(29) TCyeCy age 8 + 2ri (s,j aid a, » 8; t ¥ )- 


Note that the numerator in (29) is simply the sum over the submatrix of C, 
including diagonal communalities, and the denominator is the same except 
with unities in the diagonals. 

The relationships between cluster scores that fallibly measure the final 
k oblique dimensions are given by their intercorrelations. Between any two 
such scores, C,;, , C,;, , by the sums formula this correlation is 


(30) TeyieCy;. = SaMe as (24) but with the unities in the diagonals. 


Special Cases of Multidimensional Analysis 


Key Cluster Analysis: Total Communality (TC) and Cummulative Communality 
(CC) (Table 1, columns 1, 2) 


The TC and CC methods directly apply the general formulations out- 
lined above. As shown in Table 1, columns 1 and 2, the correlation matrix is 
initially made positive (row a) by conventional reflection methods (see [10], 
Table 16.13) in order to guarantee that variables chosen to define a given 
dimension will show correlations of positive proportionality in (4). 

An electronic computer is required in TC analysis to solve for the 
communalities by a simultaneous quadratic formula (row c). Difficulty has, 
however, been experienced in achieving a solution in empirical matrices 
[16, 17]. The CC method has therefore been developed [41] to meet the 
possible failure of solutions by the quadratic formula. CC analysis starts with 
approximations to communalities, and dimensionality analysis is reiterated 
until convergence is secured. CC analysis is thus a procedure alternative to 
TC, being a method that provides a solution in all matrices, and one that may 
be programmed either for electronic or desk calculator computation. 

Solution of the communalities in TC analysis is based on the fact that 








124 PSYCHOMETRIKA 


the communality of any variable v is the squared multiple correlation between 
v and the remaining n — 1 variable-domains ([40], formula 11), 


(31) We = Boe. e.0:€6...00 (¢ Av). 


Solving for the communalities of all variables requires a simultaneous solution 
by reiteration of the n quadratics of type (31). 

Various initial approximations to communalities required in the CC 
method are available [40, 45, 46]. On domain sampling grounds, the preferred 
estimate of h? is ‘Approximation B’’ ({40], formulas 29, 30), the Spearman 
formula, computed from a cluster of reference variables whose correlations 
are most nearly proportional to those of v. Here, 


(32) he = Xr.j/2ri; (t, J ¥v;i <9, 


where both 7 and j are the three variables showing the highest P*?; and P?; 
values, respectively, by (5). 

In both methods the defining variables of each dimension C, anchor on 
a selected pivot variable, v, , that appears to show relatively high and low 
correlations with other variables. Such a variable would usually center the 
cluster obliquely to clusters defining other dimensions. To locate the pivot 
variable a measure of “‘pivotness”’ of each of the m variables is first computed, 
namely, the variance of its squared r’s. The pivot variable v, then is that 
variable whose 


(33) var (,r2,;) isthe maximum (i = 1, --- ,n;i ¥»). 


A quicker and probably less sensitive means of selecting the pivot variable 
is to choose the one with the highest residual communality as given in (12). 
For desk calculator work, selecting v, from a correlation distribution table 
is satisfactory, though it entails some subjective elements ([41], Table 2). 
Around the pivot variable one collects the remaining defining variables 
of C, . These are the variables with highest indices of proportionality with 
v, . Three such variables at a minimum are selected. If these variables are 
called, in order of magnitude of P?,; , 7, , 72 , 73 then any additional variable 


7 may also be included in the cluster if its P?,; value is equal to or above .81 


and also is within twice P?,;, — P?,;, , that is, if its’ 


(34) Fo > On Pad > A. 


Partial communalities, row e, are computed by the general formula 
(10). In TC analysis the factoring process is terminated by the 7-criterion 
at the end of one factoring procedure only. But in CC analysis, since approxi- 
mations initiate the analysis, the first factoring process is terminated by the 
T-criterion, then new values of the communalities are computed from (6), 
and the factoring process is reiterated until the communalities converge 
(rows f, g). In both TC and CC analysis, the dimensional analysis locates the 














ROBERT C. TRYON 125 


k oblique dimensions, row h, and scores of individuals on the dimensions, 
row 7, follow the formulations as given earlier under the general method. 


Key Cluster Analysis: Preclustered Cummulative Communality (PCC) and 
Rational Cummulative Communality (RCC) (Table 1, columns 3, 4) 


Recall that in TC and CC analyses the cluster of variables selected to 
define a dimension C, is chosen during the dimension analysis. One may, 
however, choose them prior to factoring, empirically in PCC, rationally in 
RCC analysis. Preselection of clusters makes multidimensional analysis a 
quick desk calculator operation because the complete residual correlation 
matrices essential to CC analysis are not required. PCC and RCC analyses 
are procedurally identical after the analyst has clustered the variables (see 
Table 1, columns 3 and 4). 

In PCC analysis (for a recent illustration see [38]), one empirically 
groups the n variables into k’ clusters, C;, --- , C,, «+: , Cx:. Each cluster 
is made to be as “tight” as possible, i.e., is composed of variables whose 
correlations are maximally proportional by (5). Some variables may remain 
unclustered, but their number is kept as small as possible. As an aid in select- 
ing the groupings one may use a correlation distribution table. 

But in RCC analysis the rational groupings stem from the analyst’s 
theory from which he generated the n variables under study. An investigator 
commonly conceives the n variables to sample different behavior domains or 
properties of the individuals, such as the facets of Guttman [11]. The s 
variables that fall in each such theoretical subgroup are a rational cluster. 
The n variables will usually be organized in k’ such clusters, though a few 
may remain as isolates. 

As in CC analysis, one starts both PCC and RCC analyses by computing 
approximations to the communalities, preferably by formula (32). Thereafter 
the work is procedurally identical to the CC analysis excepting that only 
mean residuals are necessary. The mean residual correlation of a variable v 
with any cluster C, is, from (13), 


(35) f= (1/sEr (=a, ++: ,8,). 


The mean residual communality of the variables that compose C, is, from 
(12), 
(36) chi = (1/s,)2,K% . 

The quickest means of choosing the pivotal defining cluster of any dimen- 
sion C, is to select the one with largest mean residual communalities as given 
in (36). The partial communalities are then computed by the simple sum- 
mation formula (10). As shown in Table 1, the remainder of PCC and RCC 


analyses is the same as in CC. The final k dimensions are thus defined by a 
selection from the k’ original clusters. 











126 PSYCHOMETRIKA 


A complication may arise if one should run out of clusters before di- 
mensionality has been completely determined. If one wishes precision on 
dimensionality he would compute the n X n residual correlation matrix and 
would perform the CC procedures on it and on any later residual matrices 
that are necessary. PCC and RCC analyses can be made as precise as CC 
analysis. To do so the analyst forms new estimates of the communalities by 
(6) and then, as in CC analysis, reiterates the factoring procedures until 
communalities converge. 

Abridged PCC and RCC (Multiple Group Factor Analysis or ‘Poor 
Man’s Cluster Analysis’). The analyst need only spend a few hours of work 
on a correlation matrix if he is satisfied with an approximate multidimensional 
analysis. After reflecting the variables, preclustering them, and approximating 
their communalities as in PCC analysis proper he can compute the correla- 
tions between the resulting k’ cluster domains by (24). He may, if he wishes, 
even skip the step of approximating communalities, leave the diagonal vacant, 
and use mean 7’s instead of 2r’s in (24). 

The result is a k’ X k’ matrix with diagonal elements of unity. The 
correlation between each variable v and a given domain C, is then estimated 
by (25); if communalities are not used, he would use mean r’s instead of Zr 
values in (25). These calculations result in ann X k’ matrix. From a study 
of these data he may make a reasonable estimate of the dimensionality, 
interpret the oblique dimensions, and compute cluster scores. If he wishes a 
more accurate estimate of the dimensionality and of the most oblique k 
clusters, he can factor the k’ X k’ matrix by the diagonal method of factoring 
(see ‘‘Pivot variable analysis” below). By this means he quickly locates the 
k most nearly independent cluster dimensions. 

Orthodox PCC Analysis (Group, Grouping, and Multiple Group Methods 
of Factoring). In 1939, the writer published dimensional analysis by the 
PCC method in approximately the form presented here ((37], Sec. 7); the 
quick abridged form was also given (Sec. 5, and Analyses 14, 15a for centroids). 
Five years later Holzinger [12] and then Thurstone [33, 35] presented the 
abridged form. Thurstone labels it the multiple group method of factoring. 
At the end of abridged analysis he rotates the k oblique dimensions to orthogo- 
nal positions in order to compute residual correlations, and to see if further 
dimensions might be necessary. In his book, Thurstone [34] added the group 
and the grouping methods (see also [6], ch. 11). They differ from the 1939 
and current PCC methods in the procedure of grouping variables in clusters 
on grounds of absolute magnitudes of correlations rather than of proportion- 
ality of correlations, our P’ criterion. 


Pivot Variable (PV) Cluster Analysis (Diagonal or Square Root Factor Analysis) 
(Table 1, column 5) 


In PV analysis (see Table 1, column 5), each variable 7 among the n 














ROBERT C. TRYON 127 


variables may be selected as the central sample variable of an oblique cluster, 
C,, . Defined generally in (20), C,, here consists of three variables only. 
The other two variables are those which, after the matrix is reflexed, yield 
the highest P? values with 7. On domain sampling principles one may thus 
conceptualize k’ = n preclustered domains, C,, , --: , Cy, ,-+: , C,, . After 
determining approximations to the communality of each variable from its 
reference variables by (32), one computes the correlations between the n 
domains by (24). Then a dimensional analysis of the n X n matrix of re, .¢, , 
coefficients, with diagonal elements of unity, is performed. The cluster with 
highest column sum defines the first dimension; locating the pivot cluster of 
later dimensions requires only computing residual communalities and select- 
ing the one with highest value. With an electronic computer, however, one 
can instead compute residual matrices and more sensitively select the pivot 
variable by (33). If the defining pivot cluster of any dimension C, is denoted 
C,, , then the augmented partial communality of an oblique cluster is a 
special case of (10), i.e., 

(37) Moy, = ercr,/l — -H,,). 

The numerator of (37) calls only for the residual interdomain correlations 
of the selected pivot cluster with each of the remaining n — 1 clusters; only 
a simple n X 1 matrix of such residual correlations is necessary. Factoring is 
terminated when the sum of the augmented partial communalities of all 
variables over all dimensions, that is, the numerator of the J-criterion, first 
becomes equal to or greater than the denominator term, 7. 

Having now located k pivot clusters by the dimensional analysis, the 
analyst assigns each of the remaining n — k variables to a final set of k 
oblique clusters. Each may be assigned to that pivot cluster which defines the 
dimension on which the variable in question has its highest partial 
communality. Tighter clusters may, however, be grouped by criterion (34). 
An illustration of PV diagonal factoring procedures applied to an inter- 
domain matrix is given elsewhere ((38], Appendix D). The potentialities of 
PV analysis should be explored. It is a rapid means of estimating k, the 
dimensionality (rank) of the matrix, and hence it could precede maximum 
likelihood analysis where foreknowledge of the approximate value of k is 
desirable. 

PV analysis would give precise results if one started with correct values 
of the communalities. But it may be employed to find such values, as follows. 
After the first factoring to determine the k most nearly independent pivotal 
clusters, these clusters may be used as a constant reference set of predictors 
to compute the communality of each variable, v. Elsewhere it has been shown 
([40], formula 44) that h? is the squared multiple correlation between v and 
a set of k oblique cluster domains, i.e., 


(38) hy v Re .crer...cx ° 








128 PSYCHOMETRIKA 


From knowledge of the k oblique clusters by initial PV analysis, and with 
initial trial values of the constants required in (38) computed from (32), 
(24), and (25), simultaneous solutions of all nm communalities, reiterated to 
convergence, can be programmed electronically. Such a program may be 
integrated with a periodic replication of PV analysis in order to discover 
whether the increased precision of the communalities produces changes in 
dimensionality. 

Orthodox PV Analysis. One of the oldest factoring methods, diagonal 
factoring, used in PV analysis has recently been relabelled square root factor 
analysis [44]. In orthodox practice the analyst pivots a dimension on a single 
variable by arbitrarily inserting unities in the diagonal and factoring the 
unreflected r matrix. This practice prevents one from determining the di- 
mensionality of the matrix, rests the partial communalities rather unstably 
on the coefficients of the pivot variable alone, and leaves unclear the oblique 
cluster structure of the variables. The method may, however, be useful in 
studies with large n as a preliminary analysis to locate the most promising 
predictors of a criterion. 


General and Key Cluster Analysis (Bifactor and Second-order Factor Analysis) 
(Table 1, column 6) 


Historically, the urge to discover one general dimension in cognitive 
behaviors provided the impetus to the ultimate development of multi- 
dimensional analysis. It produced the two-factor theory of Spearman [29, 30], 
and its subsequent generalization by Holzinger and Harman [13] to bifactor 
analysis. 

Applying domain sampling principles to this case, one defines the first 
dimension as a general domain score on a full battery of all n variables; 
thereafter each dimension is a residual score on an empirically discerned key 
cluster. In Table 1, column 6, notice that in bifactor analysis the decisions 
follow the same pattern as in CC analysis, except for the one particular of the 
definition of the dimensions, row d. Here, the deviation is only with respect 
to the first dimension C’, which is defined in general formula (9) as the sum of 
scores on all 7 variable domains, i.e., s; = n. In the first residual correla- 
tion matrix and thereafter the regular CC procedure is carried through, 
each subsequent dimension being defined by s, variables. 

Orthodox Bifactor Analysis. As illustrated by Holzinger and Harman 
[13], the orthodox procedure is applied to predominately positive matrices 
that may not require reflection. The variables are preclustered into k’ clusters, 
each consisting of variables which (in our terms) show high P’ values with 
_ each other. One of these clusters is a general cluster, consisting of one variable 
drawn from each of the remaining k’ — 1 clusters. The first dimension, C, , 
is defined by this general cluster and is called a general factor, g, the first 
partial communalities of all n variables on it being called their squared 














ROBERT C. TRYON 129 


g-saturations. In the first residual matrix and thereafter, the dimensions are 
successively defined in turn as each of the residual k’ — 1 clusters (group 
factors). The partial communalities are computed by approximation (32) 
but for the defining variables of each key cluster only. Zero partial communali- 
ties are by fiat assigned to variables in other clusters. This procedure is 
unwieldy computationally. Results from it will correspond closely to those 
more efficiently achieved by the CC method, modified as in Table 1, column 
6, by defining the first dimension as a general cluster domain, the remaining 
dimensions as key clusters; preclustering is unnecessary. 

Orthodox Second-order Factor Analysis. In Thurstone centroid analysis, 
when a generally positive matrix of correlations between primary factors 
results from rotation, the correlations between these first-order factors may 
then be subjected to a new centroid analysis on one dimension only. This 
dimension is termed a second-order factor. On domain sampling principles 
the k correlated primaries represent hypothetical oblique clusters which, 
unlike the oblique clusters discovered in key cluster analysis, are normally 
poorly defined by actual variables. The second-order factor is simply a com- 
posite general cluster domain—a battery score on the k oblique clusters. This 
general composite is difficult to interpret because of its vague omnibus 
character and because of the complex redundancy of some variables that are 
common to two or more primaries. 

A cleaner general composite, if desired, would be secured by a CC 
analysis designed as described above, namely, by defining the first dimension 
C, as a composite domain of all n variables without redundancy, later di- 
mensions as key clusters. To illustrate that such a first dimension corresponds 
closely to the Thurstone second-order dimension (sometimes called g), the 
writer compared the correlations between the 11 WAIS variables and the 
Thurstone second-order factor ({7], Table 5, 18-19 yr. olds) with their 
correlations with the C, dimension defined as an nonredundant general 
cluster domain. The P’-value of the paired columns of correlations was unity 
in the second decimal place. 

Second-order analysis can be extended, of course, to more than a single 
dimension and to matrices with positive and negative correlations between 
the first-order factors. This problem is more fruitfully approached, however, 
under higher-order composites of oblique clusters discussed in the last section 
of this paper (see “Designed reanalyses’’). 


General Cluster Analysis (Centroid-Simple Structure, Principal Axes, Maximum 
Iikelihood Factor Analysis) (Table 1, columns 7, 8, 9) 


In the remaining group of methods each dimension C, is defined suc- 
cessively as general battery residual scores on all n variable domains, un- 
weighted or weighted. 

Unweighted General Clusters (Centroid—Simple Structure Factor Analysis) 








130 PSYCHOMETRIKA 


(Table 1, column 7). If the analyst wishes to define each dimension C, as the 
total unweighted residual domain score on the n variables, he sets s, = n in 
(9) and all weights equal. The procedures of CC analysis are now called for 
down to structure analysis, as shown in Table 1, column 7. As pointed out 
earlier, the indiscriminate use of all n variables requires graphical or analytic 
rotational methods to describe the simple cluster structure. Following 
Thurstone, the simple-structure factorist who uses graphical methods usually 
does not place the oblique rotated dimensions through oblique clusters of 
variables but rather the rotated dimensions bound the variables. As a result, 
the calculation of individual’s scores on these oblique dimensions or factors 
requires the use of the complex regression equation given in (27). 

This complexity usually leads the simple-structure factorist, after 
having laboriously located the “underlying” primaries through rotational 
devices, to abandon efforts to estimate the scores of individuals on them. 
Thurstone recommends (([34], p. 515) that the complex regression score be 
replaced by computing a simple cluster score on the nearest oblique cluster— 
precisely the type of cluster domain that is directly located and measured 
by key cluster analysis. 

Orthodox unweighted general cluster analysis has been fully developed 
by Thurstone [32, 34] and his followers under the name “multiple factor 
analysis.’ In Table 1, column 7, if one compares the orthodox procedure 
(2nd cell entries) originally formulated on the basic factor theorem with the 
procedures based on domain sampling (1st cell entries), they are seen to be 
identical except for certain unrefined features of the orthodox method: the 
use of highest r’s as initial estimates of communalities, the terminating of 
factoring by a statistical test of residual r’s, and the lack of reiteration of 
factoring to converged communalities. 

Weighted General Clusters (Principal Axes or Components) (Table 1, 
column 8). One may wish to attach differential weights to scores on the n 
variable domains in (9) in order to maximize the sum of partial communalities 
on each successive dimension. The definition of the dimensions in this case is 
otherwise identical to that in the preceding unweighted case. Procedurally 
(see Table 1, column 8), no initial reflection of variables is required, but an 
electronic computer is essential to solve reiteratively by least squares for the 
values of the partial communalities, row e. 

In the orthodox method of weighted general clusters as described by 
Pearson [25], Hotelling [14, 15], and Kelley [20], considerable confusion has 
been introduced by their use of unities (and sometimes reliabilities) in the 
diagonals. A rationale for unity diagonals would be equally applicable to 
any of the other methods described above and equally inappropriate, for 
communalities are called for, as stated earlier in this paper (see also [13], 
ch. 7). The only basic feature that distinguishes the weighting of general 
clusters from the preceding centroid analysis is the decision to apply weights 














ROBERT C. TRYON 131 


in the definition of each dimension. A main feature that distinguishes both 
approaches from the key cluster method is the decision to set s, equal to n. 
As a consequence both general cluster methods require rotational procedures 
in order to describe cluster structure. 

Maximum Likelihood General Clusters (Table 1, column 9). As with the 
preceding two methods, Lawley’s maximum likelihood procedures [21, 22, 
23, 27] determine k general cluster dimensions, s, = n in (9). But Lawley’s 
dimensions are significant in the sense that the final partial and total com- 
munalities reiteratively determined from trial values are model population 
values that would produce a correlation matrix from which there is maximum 
likelihood that the observed statistical matrix represents sampling fluctua- 
tions. A chi square test of residual correlations is taken as evidence that the 
model cannot be rejected. The procedures are onerous even for an electronic 
calculator, since models with k progressively increased must be tested one 
at a time. Efficiency is greatly improved if good trial values of k and of the 
partial communalities can be found to initiate the reiterative procedures. 
This maximum likelihood approach has special appeal because of its capabili- 
ties of significance testing, but its gargantuan computation requirements and 
the need of rotation to a final cluster structure are serious limitations. 

To sum up the three general cluster approaches and offer a prospectus, 
all three general methods define dimensions as indiscriminate composite 
domain scores on all n variables. This definition is a common limitation 
because it leads to the uncertain procedures of rotation to simple structure. 
Paradoxically, the Thurstonian unweighted case, the most generally used 
method of factor analysis [6, 8, 10, 34], is least justifiable. A few books, 
notably British ones, do put the method in the right perspective [1, 3, 31]. 
Centroid factoring is not as exact as the principal axes method in determining 
dimensionality. It lacks a test of significance, unlike the maximum likelihood 
method. For a general cluster solution the analyst would now use a modern 
computer programmed for the principal axes or maximum likelihood methods. 
As for the computational simplicity of the centroid method, the key cluster 
methods are simpler and have the additional merit of routinely describing 
oblique structure without the need of rotational devices. 

Designing principal axes and maximum likelihood solutions so as to 
describe oblique cluster structure would remove their present inadequacies. 
Coupling key cluster analysis to them would, it would seem, turn the trick. 
A preliminary pivot variable (PV) analysis would quickly demarcate the 
desired set of k oblique clusters. If a method of least squares fit of k axes 
through these oblique subcentroids can be devised, the result would be a 
weighted key cluster or principal cluster solution. To increase the efficiency 
of a maximum likelihood method a preliminary PV analysis should, it would 
seem, provide good initial trial values, not only of dimensionality but also 
of partial communalities (appropriately unaugmented). If this method can 








132 PSYCHOMETRIKA 


be further modified to pass hypothetical population dimensions through 
the k key clusters, rotation would not be required, but it would retain the sig- 
nificance testing features of Lawley. The result would be a maximum likeli- 
hood key cluster solution. 


Rationally Designed Dimensional Analysis 


The order in which clusters are selected to define successive dimensions 
is determined in the key cluster methods of Table 1, columns 1 to 4, com- 
pletely objectively; the object is to maximize the independence of the variables 
that define different dimensions and to select them in decreasing order of 
salience. But key cluster analysis need not be so blindly empirical; it may be 
designed to test hypotheses based on theories about the structure and order 
of variance determination among the n variables. 

Designed TC and CC Analysis. An analyst may generate the hypothesis, 
for example, that clusters among n, variables of type A (say, a group of 
sociological variables) may better predict the communality variar zes of all 
clusters among 7, variables of type B (say, a group of social attitude variables), 
than would be the case in the converse direction, B to A. To test the hypoth- 
esis, he would perform a TC or CC analysis of the n, + n. matrix of corre- 
lations, but restrict the defining variables of the factored dimensions solely 
to the A block of variables. After the terminating criterion 7’ has been met 
by the A-variables, the residual communalities of the B-variables would be 
their communality variances unpredictable by the A set. A fresh TC or CC 
analysis in reverse order, B to A, would reveal residual communality variances 
of the A-variables unpredictable by the B set. Under the hypothesis, per- 
centage determination of the variances in the A to B design should be higher 
than in B to A. 

Designed PCC and RCC Analysis. Having empirically or rationally 
preclustered n variables, for example, of the sociological and attitude types, 
an analyst may on the basis of theory generate the hypothesis that only 
certain ones of the k’ clusters are salient predictors of the remaining clusters, 
and in a hypothetical order of salience. His test would consist of a dimensional 
analysis in which the successively factored dimensions would be defined by 
the different selected clusters arranged in the order of their hypothetical 
decreasing salience. After the 7-criterion is met, then under the hypothesis 
the analyst would discover that, for the variables not in the selected clusters, 
their residual communalities should in general have progressively decreased 
as factoring proceeded and should have become negligible on dimensions 
following those hypothesized as being salient. 

Designed Reanalyses. Any of the above designs may follow upon a pre- 
liminary purely empirical key cluster analysis. Such a preliminary study, 
including the oblique structure analysis, may lead to new ideas that the 
analyst may wish to test by additional types of dimensional analyses. 

















ROBERT C. TRYON 133 


For example, will a small number of higher-order, or composite, clusters 
maximally predict the communality variances of the n variables? To illus- 
trate, a study of the oblique structure of sociological and social attitude 
clusters may suggest that one composite of sociological clusters and one 
composite of attitude clusters may leave but a minor amount of the com- 
munalities of the variables unpredictable. The test in this case would be 
a new dimensional analysis in which the first two dimensions would be defined 
respectively by the two composite clusters. The residual communalities of 
the variables would constitute the parts unpredictable from these two 
dimensions. Thurstone’s single second-order factor analysis is a special 
design, discussed earlier, in which one such general composite cluster may 
be so tested. 

Another type of reanalysis consists of bringing together in one common 
new dimensional analysis clusters found to be the most nearly independent 
sets in different prior analyses. To illustrate, using urban neighborhoods as 
objects, the writer performed three separate CC analyses: on sociological 
variables in 1940, on the same characteristics in 1950, and on voting variables 
in 1954. Each of the separate analyses yielded three salient oblique dimensions. 
The nine oblique clusters were then projected into a single common di- 
mensional analysis. This master analysis still yielded three salient oblique 
dimensions, demonstrating thereby the common tridimensionality of these 
characteristics over more than a decade. In such reanalyses the analyst may 
also wish to include brand new variables that he considers theoretically to 
belong to the common structure. 


Summary 


The general method of multidimensional analysis, designed on domain 
sampling principles, covers as special cases all the main types of cluster and 
factor analysis. The different methods vary primarily in special decisions 
about the nature of an independent dimension. Such a dimension is defined 
in general as a composite score on a cluster of variable-domains. In cluster 
analysis terms the special methods of key cluster analysis, denoted as TC, 
CC, PCC, and RCC, define each dimension as a selected set or subgroup 
from all the n variables. CC analysis is the most generally applicable. Square 
root or diagonal factor analysis, called PV analysis, pivots each dimension 
on one central variable. Bifactor and second-order factor analysis defines the 
first dimension as a general cluster of all n variables, the later dimensions as 
key clusters. Among the general cluster methods that define each dimension 
as a composite of all n variables, centroid factor analysis defines it as an 
unweighted total cluster domain, principal axes as a weighted one, and 
maximum likelihood factor analysis also a weighted one but having the 
additional feature of supplying a technique for estimating the number of 
statistically significant dimensions required. The key cluster methods de- 











134 PSYCHOMETRIKA 


termine simple cluster structure as a routine aspect of the factoring process, 
whereas the general cluster methods require laborious rotations to determine 
structure. The key cluster methods can be applied blind, but they can also 
be designed to test hypotheses. 


REFERENCES 


[1] Adcock, C. J. Factorial analysis for non-mathematicians. Carlton: Melbourne Univ. 
Press, 1954. 

[2] Burt, C. The distribution and relations of educational abilities. London: King, 1917. 

[3] Burt, C. The factors of the mind. New York: Macmillan, 1941. 

[4] Burt, C. Alternative methods of factor analysis and their relations to Pearson’s method 
of principal axes. Brit. A. Psychol., Statist. Sec., 1949, 2, 98-121. 

[5] Carroll, J. B. An analytic solution for approximating simple structure in factor analy- 
sis. Psychometrika, 1953, 18, 23-38. 

[6] Cattell, R. B. Factor analysis. New York: Harper, 1952. 

[7] Cohen, J. The factorial structure of WAIS between early adulthood and old age. 
J. consult. Psychol., 1957, 21, 283-290. 

[8] Fruchter, B. Introduction to factor analysis. New York: Van Nostyand, 1954. 

{[9] Garnett, J. C. M. On certain independent factors of mental measurements. Proc. Roy. 
Soc., 1919, A, 96, 91-111. 

[10] Guilford, J. T. Psychometric methods. (2nd ed.) New York: McGraw-Hill, 1954. 

{11] Guttman, L. A new approach to factor analysis: the radex. In P. F. Lazarsfeld (Ed.), 
Mathematical thinking in the social sciences. New York: Columbia Univ. Press, 1956. 

[12] Holzinger, K. J. A simple method of factor analysis. Psychometrika, 1944, 9, 257-262. 

[13] Holzinger, K. J. and Harman, H. Factor analysis. Chicago: Univ. Chicago Press, 
1941. 

[14] Hotelling, H. Analysis of a complex of statistical variables into principal components. 
J. educ. Psychol., 1933, 24, 417-441, 498-520. 

[15] Hotelling, H. Simplified calculation of principal components. Psychometrika, 1935, 1, 
27-35. 

[16] Kaiser, H. F. Solution for the communalities: a preliminary report. Rep. No. 5, 
AF 41(657)-76, Univ. Calif., Berkeley, Sept., 1956. 

[17] Kaiser, H. F. Further numerical investigation of the Tryon-Kaiser solution for the 
communalities. Rep. No. 14, AF 41(657)-76, Univ. Calif., Berkeley, May, 1957. 

[18] Kaiser, H. F. The varimax criterion for analytic rotation in factor analysis. Psycho- 
metrika, 1958, 23, 187-200. 

[19] Kelley, T. L. Crossroads in the mind of man. Stanford: Stanford Univ. Press, 1928. 

[20] Kelley, T. L. Essential traits of mental life. Cambridge: Harvard Univ. Press, 1935. 

[21] Lawley, D. N. The estimation of factor loadings by the method of maximum likeli- 
hood. Proc. Roy. Soc., Edinburgh, 1940, 60, 64-82. 

[22] Lawley, D. N. The maximum likelihood method of estimating factor loadings. Ch. 21 
in G. Thomson, The factorial analysis of human ability. (5th ed.) London: Univ. 
London Press, 1951. : 

[23] Lord, F. M. A study of speed factors in tests and academic grades. Psychometrika, 
1956, 21, 31-50. 

[24] Neuhaus, J. O. and Wrigley, C. The quartimax method: an analytic approach to 
orthogonal simple structure. Brit. J. statist. Psychol., 1954, 7, 81-91. 

[25] Pearson, K. On lines and planes of closest fit to systems of points in space, Phil. Mag., 
1901, 6th Ser., 559ff. 

















ROBERT C. TRYON 135 


[26] Pinzka, C. and Saunders, D. R. Analytic rotation to simple structure. II. Extension 
to an oblique solution. Princeton, N. J.: Educ. Test. Serv. Res. Bull., Aug., 1954. 

[27] Rao, C. R. Estimation and tests of significance in factor analysis. Psychometrika, 
1955, 20, 93-111. 

[28] Saunders, D. R. An analytic method for rotation to orthogonal simple structure. 
Princeton, N. J.: Educ. Test. Serv. Res. Bull., Aug., 1953. 

[29] Spearman, C. General intelligence objectively determined and measured. Amer. J. 
Psychol., 1904, 15, 201-293. 

[30] Spearman, C. The abilities of man. London: Macmillan, 1927. 

[31] Thomson, G. The factorial analysis of human ability. (5th ed.) London: Univ. London 
Press, 1951. 

[32] Thurstone, L. L. The vectors of mind. Chicago: Univ. Chicago Press, 1935. 

[33] Thurstone, L. L. A multiple group of factoring the correlation matrix. Psychometrika, 
1945, 10, 73-78. 

[34] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 

[35] Thurstone, L. L. Note about the multiple group method. Psychometrika, 1949, 14, 
43-45. 

[36] Tryon, R. C. A theory of psychological components—an alternative to “mathematical 
factors.”’ Psychol. Rev., 1935, 42, 425-454. 

[37] Tryon, R. C. Cluster analysis. Ann Arbor, Mich.: Edwards, 1939. 

[38] Tryon, R. C. Identification of social areas from cluster analysis. Univ. Calif. Publ. 
Psychol., 1955, 8, No. 1, 1-100. Berkeley: Univ. Calif. Press. 

[39] Tryon, R. C. Reliability and behavior domain-validity: reformulation and historical 
critique. Psychol. Bull., 1957, 54, 229-249. 

[40] Tryon, R. C. Communality of a variable: formulation from cluster analysis. Psycho- 
metrika, 1957, 22, 241-259. 

[41] Tryon, R. C. Cumulative communality cluster analysis. Educ. psychol. Measmt, 1958, 
18, 3-35. 

[42] Tryon, R. C. General dimensions of individual differences: cluster analysis vs. multiple 
factor analysis, Educ. psychol. Measmt, 1958, 18, 477-495. 

[43] Wrigley, C. F. and Neuhaus, J. O. The matching of two sets of factors. Amer. Psy- 
chologist, 1955, 10, 418-419. (Abstract) 

[44] Wrigley, C. F., Cherry, C. N., Lee, M. C., and McQuitty, L. L. Use of the square root 
method to identify factors in the job performance of aircraft mechanics. Psychol. 
Monogr., 1956, 71, No. 1 (Whole No. 430). 

[45] Wrigley, C. The effect upon the communalities of changing the estimate of the number 
of factors. Rep. No. 13, AF 41(657)-76, Univ. Calif., Berkeley, March, 1957. 

[46] Wrigley, C. F. An empirical comparison of various methods for estimating com- 
munalities. Educ. psychol. Measmt, in press. 


Manuscript received 5/7/58 
Revised manuscript received 9/10/58 





























PSYCHOMETRIKA—VOL. 24, No. 2 
JUNE, 1959 


LEAST SQUARES ESTIMATION IN FINITE 
MARKOV PROCESSES 


ALBERT MADANSKY 


RAND CORPORATION 


The usual least squares estimate of the transitional probability matrix 
of a finite Markov process is given for the case in which, for each point in 
time, only the proportions of the sample in each state are known. The pur- 
pose ‘of this paper is to give another estimate of this matrix and to investigate 
the properties of this estimate. It is shown that this estimate is consistent 
and asymptotically more efficient than the previously considered estimate in 
a sense defined in this paper. 


Notation 


The matrix conventions and terminology used in [2, 4, and 6] will be 
followed. Let m;, (¢ = 1, 2, --- , a; k = 1, 2, --+ , n) be the proportion 
observed on trial k in alternative category 7 of a multinomial population 
based on a sample of size S. Let E(m,;,) = uy, where 0 < wy, < 1 and 
>; wiz = 1 for each k. Let 

Mi °** My ,n-1 


M=| : 


Mar °** Ma,n-1 


Also, let ¢;; be the transitional probability that an observation which is in 
alternative category j at a given trial be in alternative category 7 at the next 
trial (7,7 = 1,2, +--+, a). Define 7; = [t;, --+ t;,] and 


hia eit - be 


tar kes too 


Let N; = [mi «++ mj], and Jet N be the a X (n — 1) matrix with N, as 
row 7. Also let 


Mir °°°)) Bi yn-1 


Mai yin ts Ma,n-1 
and v; = [ue *** Min], and let vy be the a X (n — 1) matrix with v; as row 7. 
137 








138 PSYCHOMETRIKA 


Then E(M) = un, E(N;) = »; , and E(N) = ». Also, by the Markovian 
assumption, v; = 7,4, so that vy = Tuand T = vy’ (up’)~’ if uy’ is nonsingular. 

One should note that since the Markov process has a(a — 1) parameters 
all these parameters cannot be estimated unless n(a — 1) > a(a — 1), ie, 
unless the number of independent m,, is greater than or equal to the number 
of parameters to be estimated. It therefore should be assumed that n > a. 
It shall, however, be assumed that yy’ is nonsingular and hence, the necessary 
condition that yy’ be nonsingular, that nm — 1 > a. Other assumptions will 
be introduced as they become necessary. 


Estimation of T 


Since M is not a matrix of constants, the usual proof in the theory of 
least squares (cf. [5], p. 55) does not apply in this case to show that 


T; = N,.M"(MM’)" 
is the least squares estimate of 7’; . One can work conditionally on M and 
find the 7’; which minimizes 

(T;M —N,)(T,M —N,)' = CC, 

given M. In that case 

T;, = N,M'(MM’)* 
if MM’ is nonsingular, as is shown in [2], and the estimate of 7', conditional 
on M, which minimizes C;C’ for each 7 is 

T = NM’(MM’)". 


Even if M were a matrix of constants (or even if one works conditionally 
on M), T; would still not be the “least squares” estimate of 7; . Although 
the elements of C; have mean zero (this can be seen as a corollary of (2.14) 
of [1]), they are not uncorrelated and do not all have the same variance, 
o°.* In a situation such as this, the least squares estimate of 7'; is obtained 
by transforming C; to 


C, = Ca;*", 


whose elements are uncorrelated and have the same variance, o” = 1, and 
by minimizing 


Cr = Cath. 


*I am indebted to a referee for pointing out that equal variances are not required 
to use the method of least squares for estimation, but only necessary for the optimal 
properties of least squares estimates, in particular having minimum variance (as defined, 
e.g., in [3]) among linear unbiased estimates, to be realized. The referee also points out 
that cov (mj, Ci) = 0, ie., the error is uncorrelated with the independent variables in 
the relation m;n41 = Dj ti; mj — Ci. This can be seen as a direct application of Lemma 1. 














ALBERT MADANSKY 139 
where A; is the covariance matrix of C; . Let us first determine A; and A;". 
Use is made of the following lemma. 


Lemma 1. Cov (min, M,n-2) = Me sn—z (tf? — uin)/S, where t{? is element 
(i, r) of 7". 


Proor. Define 


oe 1 if individual s is in category 7 at time j, 
or 0 otherwise. 


Then 


Ss 
x, woritlaas ¥.0-0/8 


s=1 


[Pr { Vins = 1, Y-.n-n.0 = 1} — Mealtr.n-a)/S 
= [Pr {Vins ot | Ye a-be = 5 om inky n—e]/S 
mi ¢ +g Noe a Minbtr n-z)/S. QED. : 


When k = 0, t{* is defined to be 6;, = vila potig 
0 otherwise’ 


nomial variances and covariances result, namely 


COV (Min 5 M, nz) 


the usual multi- 


cov (Min ’ Men) — —Hinklen/S 
and 
var Min = Min(I caae Bin) /S. 


Let C;, be the kth element of C; ,k = 1, --- ,m — 1. Then 
S var Ca = S var (> tj; Mix — mses) 
j=1 


= Bs neil a" Bi .n+1) + = tijmie(l aE Min) 
i7=1 


sie , Us jlig Minba'k - 2 zs ts malts; fa Mi,e+1) 
i=l 


isi’ 


a a 2 
Hs,eai(l + Ms ,e+1) ca > ti sin | (> tan) 
i= j=1 


a 
2 
= Mi,kt1 > UiiMix » 
j=1 


and when k > 1 








140 ‘PSYCHOMETRIKA 


S cov (Cu ’ Ci.) = hinta = Mi een) —_ > bsspsltis a Mi est) 


j=1 


oe 2. taal, Pa Min) 
j=1 


+ > ti; ~ bsetter(tis eas Mit) 


r=1 


ll 


a 
(k+1-1) (k-1=-1) 
= =. ts; [ujrti; + Mi t4rbji 
j=1 


S a 
k- -_ 
+ p> Ciret he tistir 4 +- Mi .veitss Y= 0, 
i=1 : 


r=1 


since 
a 
(k-1) (k+1—-1) 
tke «te 
7=1 


Hence A; is a diagonal matrix with diagonal elements 
2 
Mi,k+1 os UijMix « 
i=1 


Now A, is an unknown matrix, but each u is consistently estimated by 
its corresponding m. Also, since 


plimN =»y and plimM = ug, 


Sa Sa 


it follows that 
plim T = my'(up’)* = T, 


So 
so that elements of 7' are consistent estimates of corresponding elements of 
T. Therefore, A; , @ consistent estimate of A; , can be formed by substituting 
the m’s and the elements of 7’ for the corresponding y’s and ?’s in the above 
equations for the elements of A, . 
Consider the estimate 


7, = N,A;'M'(MK;"M)" 


(if MA;'M’ is nonsingular) which would be the least squares estimate of 
T; if A;’ = A;' and if A, were not a function of the ¢’s. It is easy to see that 
T’; is also a consistent estimate of 7’; . 

If R and Q are symmetric matrices then R > Q if and only if 
x(R — Q)x’ > 0 for all vectors x. Let us adapt the definition of minimum 
variance linear unbiased estimate of a vector-valued parameter given in 
((3], section 2.5) as follows. It may be said that of two consistent estimators, 











ALBERT MADANSKY 141 


say u and v, of a vector-valued parameter 0, uw is asympotically minimum 
variance if the asymptotic covariance matrix of ~/S (u — 6) is less than 
or equal to the asymptotic covariance matrix of ~/S (v — @) in the sense 
defined above. 


Turorem. Let 3, be the asymptotic covariance matrix of VS (Tf; — T;) 
and let 3; be the asymptotic covariance matrix V8 (T,~ T,): Tae, < % 


Proor. We know that 
ma = wn + O(1/ VS), 
since for fixed 1, m,, is the maximum likelihood estimate of y,, . Therefore 
ti; = ti, + O(1/VS) 
and also 
fs = ti; + 00/VS) 
for all 7 and 7. Then write 
Ni=%+X/V8, M=un+Y¥/VS, and 4;' = Aj' + 2Z/VS. 
To terms of order 1/-/S, 
MM’ = pp’ + (u¥’ + Yu')/VS, 
so that 
(MM’)* = [I — (uu) *(uY’ + Yu)/V lun’) 
to terms of order 1/-/S. Similarly, 
(MA;"M’)™ = [I — (wAs*w’) WAS Y’ + Zu! + YAT*y’)](uAz"a’)* 
to terms of order 1/8. Therefore, asymptotically, 
V8 (TP; — T.) = [@.¥! + Xu’) — TMuY’ + Yu’) un’) 
and 
VS (P, — 7.) = (Sp! + Xiu’ + vA" Y’) 
— Ti(uAsY’ + wZy’ + YAjz"n’)uAz"y’)'. 
Since v; = 7';u, the above equations reduce to 
V8 (P; — T) = (X — T.Y)u' (un!) 
and 
V8 (fi — Ti) = (X — Ti) Aw As’). 


But the covariance matrix of X — 7,;Y is A; , so that 











142 PSYCHOMETRIKA 


Zi = (up!) (wAiw’)(up’)* 
and E. 
2; = (wAj'n’)™. 
By using the Schwarz inequality for matrices given in ([3], section 2.5), it 
follows immediately that 


~ 


2, >; . 


Thus 7; is asymptotically more efficient, in the sense defined above, 


than is 7; . In fact, a modification of the above proof will show that 7; is 
asymptotically at least as efficient as any other estimator of the form 
N;PM'(MPM')™’, where P is a positive definite matrix. 

One should note that 7 is obtained by first computing 7 and then 
modifying it. One can just as well, via the same procedure as above, modify 
T to obtain another estimate of 7, 7*, say. However, the asymptotic ef- 
ficiency of 7'* will be the same as that of 7’; for each 7, since 7'* will be of the 
form N;PM’(MPM’)"’, where P is A;' based on 7, i.e., 

P = A;' + 0(1/V8). 


It has been shown in [2] that 7 has the desirable property that eT = e, 
where ¢ is the a-dimensional vector (1 1 --- 1). That this is not true of 7 
can be seen from the example of the next section. 

Discussion 


Let us consider Miller’s example [cf. 6]. In his case a = 2, n = 20, and 
Tt = joe of 
08 .61 
Using this estimate and N and M, the diagonal elements of A, are found to be 
[.20 —.04 —.03 .23 .09 .09 —.01 .16 .19 .02 
—.01 .16 .19 .22 .15 —.05 .09 .19 .12] 


and the diagonal elements of A> are 


[.11 .28 .25 .05 .12 .12 .22 .08 .02 .16 
.22 .08 .02 — .04 —.01 .19 .12 .02 .06]. 


Note that some of these elements are negative. However, since these 
elements are estimates of variance (which are non-negative quantities) 
these “estimates” were replaced by .001 (zero cannot be used since A; must 
be nonsingular). It is then found that 


=| 81 | 
—.02 .49 














ALBERT MADANSKY 143 


This example illustrates that e7’ ~ e in general. It also shows that elements of 
T may be inadmissible in that they can turn out to be negative. However, 
as Goodman points out [2], 7 also has this latter fault. 

We suggest in this case the estimate 


7? - a 25 | 
19 .75 
To compare 7 with 7, it suffices to compare the covariance matrices of 


T, = T, and T, (since T, = e — 7, and T, = e — 7,). An estimate of 3, 
is (MA;'M')~', and §, is estimated by 


(MM’)""(MA,M’)(MM’)"?. 


After re-estimating A, by using T (and finding that all the diagonal 
elements of this estimate are positive), we used this estimate and found that 


ee = 049 pe 


— .123 517 
and 
at 059 — Hed 
— .149 .616 
The easiest way to see that est 3, > est 3, , where > is as defined above, 
is to note that all principal minors of est =, — est >, are non-negative. 


Goodman also states in [2] that if the observed transitional proportions 
are available, they would clearly be more appropriate in the estimation of 
transitional probabilities. In [1] Anderson and Goodman give estimates of 
T and the asymptotic variances and covariances of the elements of their 
estimate when the observed transitional proportions are available. Calling 
their estimates /;; , the asymptotic variances and covariances may be written 
as 


var ‘i, = t,,(1 — t:;)/$; ’ cov (é;; ) re) = — 6 jb iter jd; , 


where 


and 


a n n 
%: = Zc titela a 2 Hie . 


i=1 k=2 


We can estimate u,;, by m;, and so, in the example under consideration, 








144 PSYCHOMEZRIKA 


we can estimate 3, , the asymptotic covariance matrix of the estimate of 7’; 
(and hence 7’, = e — TJ) when the observed transitional proportions are 
available, in order to see the improvement when this information is available. 
It is found that est ¢; = 15.3 and est ¢. = 3.7 so that 


wee pes mel 
000.051 


using T to estimate T. Hence est 2, < est =, in this case. 

With regard to the amount of extra computation involved in computing 
7’, one can see from the form of the estimate that once T is given, the modi- 
fication process will take about as much time as the computation of 7 itself. 
Since we have no idea of the relative decrease in the variances of the esti- 
mates, we cannot discuss the trade-off between the doubled computation 
time and the reduction in variance. This trade-off is, however, an important 
practical factor in determining whether 7 or T is used. 


REFERENCES 


[1] Anderson, T. W. and Goodman, L. A. Statistical inference about Markov chains. Ann. 
math. Statist., 1957, 28, 89-110. 

[2] Goodman, L. A. A further note on “Finite Markov processes in psychology.’’ Psycho- 
metrika, 1953, 18, 245-248. 

[3] Grenander, U. and Rosenblatt, M. Statistical analysis of stationary time series. New 
York: Wiley, 1957. 

[4] Kao, R. C. Note on Miller’s ‘‘Finite Markov processes in psychology.” Psychometrika, 
1953, 18, 241-243. 

[5] Kempthorne, O. The design and analysis of experiments. New York: Wiley, 1952. 

[6] Miller, G. A. Finite Markov processes in psychology. Psychometrika, 1952, 17, 149-167. 


Manuscript received 6/1 3/58 
Revised manuscript received 11/8/58 














PSYCHOMETRIKA—VOL. 24, No. 2 
JUNE, 1959 


AN AUGMENTED MODEL FOR SPONTANEOUS REGRESSION 
AND RECOVERY* 


Davip McConneE.ut 


INDIANA UNIVERSITY 


The Estes model for spontaneous recovery and regression is modified by 
relaxing the assumption that seg, pene stimulus fluctuation can be 
treated as negligible in the general case. The augmented model allows descri 
tion of stimulus fluctuation both between and within experimental periods. 
——— of the model to mock experimental situations illustrates some 
of its properties. 


This paper is based upon experimental findings reported in [5], and 
represents an effort to broaden the scope of Estes’ model for spontaneous 
regression and recovery [2] so that it may hold for a larger class of data. In 
the original model, Estes partitions the total population S* of stimulus 
elements available in the experimental situation into two sets S and S’. 
S is the set available during an experimental period, and S’ is the set un- 
available during an experimental period. The numbers of elements in the sets 
are related by the expression N + N’ = N*. The probability j, defined on S, 
is the probability that an element selected at random will leave S and enter 
S’ during an instant At of a rest period between experimental periods. Simi- 
larly, 7’ is the probability that an element from S’ will enter S during that 
instant. 

These probabilities are undefined by Estes during experimental periods. 
In the augmented model they will be defined both within and between ex- 
perimental periods. As in the Estes model, the equation 

a ae J 
NN  §+7 
relates these probabilities to the respective subset sizes. The augmented 
model thus retains the basic notation of the Estes model. It will, however, 
be more convenient to regard S as the set in which direct conditioning or 
extinction of stimulus elements is possible, and S’ as the set in which no 
direct conditioning is possible. From period to period, and during rest periods 
between experimental periods, S* is considered to remain unchanged, i.e., 


the same elements will be contained in S* and their number N* will be fixed. 


*This study is based upon a dissertation submitted to the Department of Psychology, 
Indiana University, in partial fulfillment of the requirements for the Ph.D. degree. 
tNow at Ohio State University. 





145 











146 PSYCHOMETRIKA 


The values of j and j’ may vary from intraperiod to interperiod intervals, 
but will be assumed constant, on the average, within any one interval. 
Lastly, the proportions of conditioned elements in S*, S, or S’ will be con- 
sidered to remain unaltered when values of j or j’ shift. 

The general situation requires a stimulus-samplihg operator which 
will carry out conditioning of new elements in S for any values of the param- 
eters j, j’. This operator, ¢, must be such that the proportion of conditioned 
elements at the conclusion of trial n + 1 is related to the proportion on the 
previous trial by an expression such as P,,, = ¢P, . An operator invoked 
some time ago by Estes and Burke [4], such that P,,, = P, + 0(1 — P,), 
(where @ is the proportion of elements from S sampled by the organism on 
trial » + 1) will carry out conditioning in S when j and 7’ are both 0, but 
cannot apply unmodified when the exchange parameters are nonzero. It 
should be observed that the applications made by these investigators involve 
what Bush and Mosteller [1] call ‘‘experimenter-controlled” reinforcement 
situations, while extension of the operator to the present experimental 
context [5] entails “‘subject-controlled” reinforcement. This distinction, 
at least in the case of animal conditioning, may be considered that between 
respondent conditioning and operant conditioning. 

The operator ¢ will be applied in the subset S exclusively. Response 
probabilities in S’ and in S, respectively, immediately after a trial on which 
¢ is applied will be related to probabilities immediately after the previous 
trial by the following equations. 


(1’) Pi. = Pl — 7) + PUN/N’, 
and 
(2) Past = P.(l — 2) + Pay'N’/N + Of1 — P.(1 — 9) — Plj’N'/N]). 


Equation (1’) states that the proportion of conditioned elements in 
S’ at the conclusion of trial nm + 1 is equal to the proportion (1 — j’) of those 
found in S’ at the conclusion of the preceding trial, plus the weighted pro- 
portion jN/N’ of those found in S at the conclusion of the preceding trial 
(the latter having escaped from S into S’ during the intertrial interval). 
Equation (2’) states that the proportion of conditioned elements in S at the 
conclusion of trial n + 1 is equal to the proportion (1 — j) of those found in 
S at the conclusion of the preceding trial, plus the weighted proportion 
j'N'/N of those in S’ at the same time, plus the proportion @ of the un- 
conditioned elements from the same sources (these having been sampled 
and conditioned on trial n + 1). Rewriting (1’) and (2’), 


(1) Pian > (1 - f)\P = FP, = 0, 
and 
(2) Pi. — (1 — O11 — PP. — (1 — O5P. = 0. 











DAVID MCCONNELL 147 


These equations are the basic finite difference equations for the augmented 
model; both the exchange-free learning equation described by Estes and 
Burke [4] and Estes’ general expression (equation (3) of [2]) for spontaneous 
recovery and regression between periods can be derived from them as special 
cases. For the first case, note that when j = j’ = 0, (1) and (2) become 


(la) Pin — Pr = 0, 
and 
(2a) Pisa. — (1 — OP, = 0. 


These have the solutions P’ = P/, and P, = 1 — (1 — P,) (1 — 6)”, which 
are identical with those of Estes and Burke [4]. 
For the second case, note that when j, 7’ are not equal to 0, but 6 = 0, 


(1b) Pin = (1 = 7)P. + 7? ; 
and 

(2b) Pon = (1 — PP. + 
From (2b), 








’ 


, — 


yoo Pee OF PP and pt = Par — (1 — Py, 
J J 
substituting back into (1b), 
Pause — (2 — § — f)P an + (1 —-j-—j)P, = 0. 
This last equation has a characteristic equation whose roots are 
_2-j-jFVO@-j-j) -40-j-)) 
1,2 si 2 





mg me FD 





Then, 
P,=a1l—-j-j’"+aQ=cqa° +e. 


Thus 
ex 
Prat irs ca” + C2 ’ 


whence, on substituting in (2b), 
Pi = — (j'/j) aa" +c . 
From the equations for P, and P? , 
Po =q +e, and Pj = (—j'/j)e a Ces 


whence 


a = sha (Po — PA), and ¢ =Py»—«¢. 











148 PSYCHOMETRIKA 


Thus, 
P, = 15 (Pr — Pha" + Po — x5 (Po - PO, 
or, since 
toi J, where ger, 


this can be written 
P, = PolJ + (1 — J)a"] + Pol — J) — a’). 
This expression is identical, when ¢ is substituted for n, with Estes’ basic 
equation for spontaneous recovery and regression. The corresponding equa- 
tion for S’ is 
P! = P,J(1 — a") + Pol — J(1 — a”). 


For the general case, when j, j’ and @ are nonzero, note that (2) and (1) 
are in the forms, respectively, 


(3) Us+1 — Qu, — bv, = 8, 
and 
(4) Ve+1 — dv, — j’u, = 0, 


whereu = P,v=P’,e=n,a=(1-6(1-j,b=(1-O0j,d=1-j7. 
From (3), 





Uses — AUs41 — 8, _ Uz+1 — au, — 0 
Vz41 a b ’ v, = b . 





Upon substituting in (4), 





(4a) Uz+2 — rm — 0 a C (ees ee 6) ike j/Uz — 0, 
which becomes 

(4b) Usa — (€@ + d)uzs1 + (ad — j’b)u, = (1 — dO, 
or 

(5) Pare — (@ + DPasi + (ad — j’b)P, = 5'0. 
This is in the form 

(6) AP a+2 + BP, 41 + CP, _ Dy’, 


for which the characteristic equation is 


(7) V(r) = Ar’? + Br+C=0. 











DAVID MCCONNELL 149 


If the two roots of this equation are real and unequal, whether positive or 
negative, the explicit solution will have the following form. 


ae 
(8) P, Wu) + cri + Gere . 


It easily can be proved that (7) can have no complex roots. It would have 
such roots when B? < 4C, or when, expanding, a” + 2ad + d’ < 4ad + 4,’b, 
or transposing, a” — 2ad + d’ < — 4y’b, or (a — d)’ < — 4j’b. But (a — d)’ 
is always positive, as are j’ and b = (1 — @)j, so the expression (a — d)? < 
— 4j'b can never hold, and neither can the original expression B’ < 4C. 
Thus 7; ,. must always be real. 





Now, 
Dit _ 

@) Wu) ~ 
after substitution.* Thus, from (8), 
(10) P,=1+earit+er. 
The roots of (7) are 

—B + VB —4C 
(11) r; ees 2 ? 


since A = 1. From this it can be shown 
(12) 0 t7,< 4 <1, 


after substitution. Therefore P = 1, and P) = 1+ c¢, + ¢ . From (2) and 
(10) 


1 +ort** + or") — (1 — #(1 — p(l + ari + er) — 0 
(1 — 6)j 





“—_~ 


1 + kiri + ker? ’ 
where 


cate | nd! We 2, _ o — & - Oi -—F' 
-.. 6 ee (1 8 


Then PZ = 1, and Pj} = 1+ k, + k, . From this, 
_ Fo =U - 2 - 0 Poe | 6 








(13a) ky = 2 





(10a) «e mara 
and 
(10b) q = —(1 _ Po) sae 


*A = 1,B = —(a+ 4), C = ad — jb, D = j'6,n =1,a =(1—6)(1—j),b= 
(i= OG 9 i= %. 











150 PSYCHOMETRIKA 


Now, from (10) and (10a) it is evident that P, = QP (where Q is a variable 
positive coefficient) since c, is linearly related to Pj , and P, = Qe.(r2 — 11) = 
Qc. , because of (12). 

This relationship permits equal initial probabilities in the available 
set S to diverge as a period progresses, the higher curve obtaining for the 
higher starting probability in the unavailable set S’. It can also result in 
crossovers between initially disparate curves, such as those found in the 
relative frequency measures of the experiment [5] leading to this paper. 
Graphs of the augmented model, based on (10) and (13) above, and using 
dummy parameter values, appear in Figure 1. 

Figure 1 depicts the application of the augmented model to a mock 
experiment involving the same conditions reported in the empirical paper 
associated with this study [5]. Although care was taken to insure that the 
ordering of curves in Figure 1 corresponds to that obtained in the empirical 
study, this was done only to demonstrate that the results of that study 
could fall within the domain of the model presented here. No effort was 
made at curve fitting—until such time as new experimental predictions are 
made from the augmented model, curve fitting would be at best second 
guessing. 

The particular parameter values chosen do not in general uniquely 
determine the form of the functions shown; and these values were in some 
cases selected only to illustrate special points. The values of Py and P% in 
period I were taken from the observed relative frequencies correct in the 
first minute of reinforced responding [5]. The high value of 6 (.8) chosen for 
periods I and II is intended to illustrate that even with very rapid acquisi- 
tion and extinction, reversals of initial probability relationships, such as 
that demonstrated in period II of the empirical study, can be expected 
between immediate and delayed extinction groups if intraperiod exchange 
between S and S’ is permitted. The large value of j, together with the dis- 
proportionately small value of j’ chosen, is intended to emphasize that the 
corresponding value of J within the two periods (.0141) may be markedly 
different from that prevailing between periods (J = .8 between periods I 
and IT). 

In period III, one sees how a radical (80-fold) reduction in 0 affects 
terminal probability. Here, for the first time, response probability in S’ 
actually decreases across the period for two groups—a result not super- 
ficially obvious in considering the augmented model. 

The main object of the selection of parameter values for period IV is to 
illustrate the nonexchange learning situation for the same value of 6 used 
in the previous period, where exchange was permitted. Note that probabili- 
ties are stationary in the unavailable set, which in the augmented model is 
the exception rather than the rule. In the Estes model, this condition was 
assumed throughout experimental periods [2]. 





baa 
a> 
re 


DAVID MCCONNELL 


‘asuodsal PoUOT}IPUOD 94} JO 4VY} ST 





payord Aqyqeqoid syy, “f JO senyea yUsJEyIp useM4oq szuIod uUOT}IsSUBI} YUOSeIdeI 7 POYIVUL SOUT] Poysep [BoIZIOA OL], ‘spotsod 4so1 poyetod 
-1oyut oy} BuLnp f 10J pus ‘Zuruoytpuos Zuump ,f pus ‘C ‘g sioyourvsed oy} 10j sonyea AUIUINp ZuIsN [epour poyUSWBNe oY} JO SorystIOjOVIBYO 


T qunory 


S|DIay 
OS Sb Ob Se OF Sz 02 SI O1 S Ov Se O€ SZ O2 SI Ol S 


Ov SE O£ S2 OZ SI O1 S O€ SZ O2 SI O| GS 


























































































tr. FF eS So a. ere. rT Ue . . OS Oe 0 
ie — l 2° 
v ° 

= Z — > 
> : ¥ “SE . ¢ 
z : . 

\z | 1 4 | | 21 

O=f=! tae sor ftiosflet | o=,f eet t aH loz f “ef 

‘1o=6 (4ubIN) ALG TY ‘1O'=@ (4921) OT w J mes (461Iu) EY CT ‘804427 1 

OS Sv Ob SE OF SZ OZ SI O1 SG 4 ~M OSE OE SZOz~ SI OL S } ~— ObSE O€Sz OzSI OI S {i Of SZ0z SI O1 S 

TTT tT te rn ~~ T Oo 
£y | le 2 
acs v 
sg eae 3° 
+ v'e 8" 




















FA | O'l 


as 














152 PSYCHOMETRIKA 


Discrepancies Between Initial Probabilities in S and S’ 


Above it is shown how an excessively high proportion of conditioned 
elements in the set S’ can retard extinction and bring about a reversal in 
ordering of immediate and delayed groups in a simulated experiment. Now 
consider the case in which Pj < Pp, at the start of an acquisition period. 
Referring to (2) of the augmented model, and solving for AP, , gives 


(14) AP, = &1 — P,) + x11 — OP. — P,). 


It is clear from this expression that if the factor (P{ — P,) is negative and 
sufficiently large, the increment AP, becomes negative. This implies that 
probability in the available set will decrease until such time as the condition 


fon _ Ki — P,) 


holds, and AP, = 0. When @ is large and conditioning proceeds rapidly, such 
an initial discrepancy may produce no visible effect on the conditioning curve. 
But when @ is not large, there should be a detectable dip in response prob- 


ability in S at the start of an acquisition period. 

Figure 2 shows a low-@ dummy experiment, in which by the trial 20 
1.007 
80+ 
-807 


-707 


ioned Elements 


1.6 0- 


nditi 
@ 
es da 


in 
Z 





-20 





Proportion of @ 





100 200 300 400 500 600 700 800 900 1000 
Trials (n) 
Ficure 2 


Curves of proportions of conditioned elements in S and S’ for a dummy experiment in 
which @ = .01, 7 = .20, 7’ = .05, and the starting probabilities are Pp = 1, Po = 0. 


the dip in available set probability has reversed. Note that after its initial 
intersection with the curve for the unavailable set, the available set curve 
increases monotonically below the other curve. These same equations may 
be applied to an extinction period by conceiving of the period as one of 
acquisition of the previously unreinforced response. We may conjecture that 











DAVID MCCONNELL 153 


when initial probability of the response to be extinguished is less in the available 
set than is the proportion of elements conditioned to it in S’, there should 
be detectable an initial rise in response probability, preceding the typically 
decreasing extinction function. In general, however, exaggerated discrepancies 
in probabilities such as could obtain at the end of an acquisition period 
during which values of 7 and j’ are both low, will not result in these early 
phenomena in a subsequent period, because there is no reason to assume 
changes in the parameters j and j’. 

Without such changes the exchange of elements will proceed no more 
rapidly than in the preceding period, with the result that the arrival of 
elements from S’ will be offset by conditioning at the same rate as in the 
previous period. This will be true also for the extinction phenomenon. Barring 
the assumption of changes in the drift parameters from one period to the next, 
the only way to demonstrate these phenomena is to produce the necessary 
probability. discrepancies “artificially.” Consider, for example, certain 
elements introduced into the available set at the start of acquisition, con- 
ditioned to the reinforced response in sufficient number to raise the initial 
probability of the response in S markedly above the corresponding proportion 
of conditioned elements in S’. Then an influx of elements from S’ could be 
expected to bring about the initial dip in question. The elements introduced 
might be associated with transitory phenomena such as temperature changes, 
handling by the experimenter, or transfer effects from other situations. 


Period Spacing 


Estes, still with the assumptions that 7 = j’ = 0 within periods and that 
terminal response probability for the reinforced response in each period is 1, 
predicts that increasing time between periods will accelerate the accumu- 
lation of conditioned elements in S* [3]. The augmented model permits the 
same prediction, but qualifies it to the extent that if 7 and j’ are high during 
an experimental period, increasing the time between periods will have 
virtually no effect on response probability. To document this qualification, 
consider the extreme case of 7 = j’ = 1 within experimental periods. Under 
this condition, the general forms of the solutions (10) and (18) still apply. 
However, both equations now have roots = ~/1 — 96. The only effect of the 
negative root, since trials are discrete, is to introduce a slight oscillation into 
the growth curves of the proportions of conditioned elements in S and S’. 
But the outcome of the increase in j and j’ is to accelerate drastically the 
accumulation of conditioned elements in S’. At the completion of several 
trials the proportions of conditioned elements would be indistinguishable 
from 1 in both S and S’. When, therefore, j and j’ are large, so that equal 
limits are quickly approached by response probabilities in S and 8S’, it is 
clear that increasing time between periods cannot increase the over-all rate 
of conditioning in S*. 








Proportion of Conditioned Elements inS and S' 


154 PSYCHOMETRIKA 


Estes also predicts that in a sequence of randomly alternated acqui- 
sition and extinction periods, response probability in S* will approach the 
asymptote z, the proportion of acquisition periods in the sequence, and that 
this asymptote will be independent of values of J and of the time between 
periods: In general, predictions for mixed sequences, whether based on the 
augmented model or on Estes’ model, must be applied cautiously. Both 
models result in the prediction that a will be the expected value at time ¢ 
following the mth in a sequence of such periods. But while the approach of 
the expectation to x is monotonic across the sequence, it is by no means true 
that for a fixed experimental sequence the interperiod asymptotes must 
each approach z. 

This point is illustrated in Fig. 3, representing four dummy experiments, 


Period 


I Rest I Rest I Rest IW Rest W Rest WI Rest WI Rest WM Rest IX Rest 


' are Niacin 
p ia ae ae , ae ab ee ; oe 
50 ea 63 A} 66 t 66 





P iy 33 ' a 
sf a Ne ine ny ee NB” <2 
Nj =j'=0 in experimental periods; J=-5 in rest periods; =.5 


ee ey ee 
Oe a Rad Ns... 


j=J=T ‘in experimental periods ; +: 5 in rest periods; 1 =.5 


t 
Oa , i ~~. ee ae Oe en. 
r * i} iF: “7 t ites ft iW 





\ 7 38 4 iin! 7 pi ees 
p' —— a, et amet — ” searnata »~ 


f \ jsj'20 in experimental periods; J =.5 in rest periods; 1 =.67 


te ee ae A 


ec =j'=1 in experimental periods; J* -5 in rest periods; T=.67 


FIGURE 3 


Response probability in S and S’ for four dummy experiments involving a mixed sequence 
of acquisition and extinction periods. 


for each of which the proportion of conditioned elements in both available 
and unavailable sets is plotted period by period. Examination of the first 
and third +3 of these curves demonstrates that even with the Estes assump- 
tion (j = j’ = 0) regular oscillation can be expected to continue indefinitely 
for any Sues or for any group whose members are identically treated— 

















DAVID MCCONNELL 155 


in spite of the fact that the mean interperiod asymptote is clearly 7. This 
oscillation should of course disappear if Ss are randomly assigned to different 
schedules and response curves are plotted as the mean of all measures at 
a given trial or time. The second and fourth pairs of curves in Figure 3 show 
what should be expected in the extreme case of the augmented model when 
j = j/ = 1, and when z is the same as in the first and third experiments, 
respectively. It is clear that the extrema of oscillations may be significantly 
greater than those for the Estes model; it is evident that this should be true 
to some degree even for j and j’ less than 1. However, it is equally clear that 
the mean interperiod asymptote in the extreme oscillatory cases is still 7, 
as it is for the Estes model. The advantage of the augmented model for long, 
mixed sequences of periods would thus appear to accrue from its ability to 
interpret interperiod asymptotes which are inordinately higher or lower 
than the expected value of z. Since, however, some oscillation from period to 
period is to be expected with the Estes assumptions the intraperiod exchange 
notion should not be applied on this basis alone. The criteria of intraperiod 
reversals of order or nonmonotonic response functions should first be invoked, 
since only they are unique to the free-exchange situation. 

The potential applicability of the augmented model to a variety of 
psychological situations appears to the author to be derived from the simple 
notion that stimulus fluctuation within even relatively short experimental 
periods can be evaluated by means of the simultaneous equations proposed, 
without being relegated to the status of a variable “outside” the model. It 
is conceded that these equations do not simplify the mathematics of Estes’ 
basic model, but relief is in sight on this score from digital computers. A 
number of routines are now available for simultaneous solution of n equations 
using trial values of constants by use of high-speed computers such as the 
IBM 650 and 701 or 704. These routines are to be found in computing facility 
libraries at Indiana and Ohio State, and are available by request. 


REFERENCES 


[1] Bush, R. R. and Mosteller, F. Stochastic models for learning. New York: Wiley, 1955. 

[2] Estes, W. K. Statistical theory of spontaneous recovery and regression. Psychol. Rev., 
1955, 62, 145-154. 

[3] Estes, W. K. Statistical theory of distributional phenomena in learning. Psychol. Rev., 
1955, 62, 369-377. 

[4] Estes, W. K. and Burke, C. J. A theory of stimulus variability in learning. Psychol. 
Rev., 1953, 60, 276-286. 

[5] Mc Connell, D. G. Spontaneous regression and recovery in a sequence of discrimination 

periods. J. exp. Psychol., 1959, 57, 121-129. 


Manuscript received 4/1/58 
Revised manuscript received 8/22/58 




















PSYCHOMETRIKA—VOL. 24, No. 2 
JUNE, 1959 


A MODEL FOR ORDERED METRIC SCALING BY 
COMPARISON OF INTERVALS* 


Rosert F. Facortt 


UNIVERSITY OF OREGON 


This paper presents a model of individual choice behavior for appli- 
cation to experimental situations in which a subject is required to compare 
utility intervals (differences in subjective value). This model is contrasted 
with a weaker model, which is also derived. Both models generate ordered 
metric scales, but differ in predictive power. An experiment on the utility of 
grades, which provides a test and comparison of the models, is presented. 


Coombs [2] introduced the term ordered metric to denote those scales 
which provide an ordering of the set of alternatives, or stimuli (the single 
property of an ordinal scale), and in addition at least a partial ordering of the 
distances between stimuli (utility intervals). Siegel [8] presented a method, 
based on a one-person game, which generated a so-called higher-ordered 
metric scale, i.e., a complete ordering on utility intervals. The present model 
also generates a higher-ordered metric scale. The term ordered metric scale, 
as used in this paper, will always refer to a higher-ordered metric scale, as 
the term is used by Siegel [8], and will be abbreviated OM. 

Coombs presented no formal characterization of the OM scale, nor did 
Siegel; one of the purposes of this paper is to present such a characterization. 
Two important results will be presented in relation to the properties of such 
scales. First, it will be shown that it is possible to construct a scale which 
satisfies the above two properties (an ordering of the set of stimuli, and an 
ordering of utility intervals), but which is inadmissible as an OM scale. This 
implies, of course, that the intuitive definition of an OM scale given above is 
inadequate, since it admits to the class of OM scales types which were not 
originally intended. Second, it will be shown that it is possible to have two 
or more models, all of which generate OM scales, but which differ in predictive 
power. 

The subject will be required to make judgments in relation to pairs, 
triads, and tetrads of stimuli. For pairs of stimuli, he will have the traditional 
task of choosing one member of the pair on some specified basis. For triads 

*This research was ¢ Non 225 in part by Group Psychology Branch, Office of Naval 
Research, under Contract coy a 17), with Stanford University, and in part by U. S. 
Public Health Service grant M. 


{The author wishes to cnn his indebtedness to E. W. Adams, University of 
California, whose constructive criticism did much to improve the quality of this work. 


157 











158 PSYCHOMETRIKA 


and tetrads of stimuli, he will be required to make comparative judgments 
of intervals. As an illustration, if three stimuli have been ordered by the 
method of pairs xyz, then the subject could be required to make the judgment 
whether y is more similar to x than to z. The particular relation utilized (in 
this illustration more similar) will depend on the experimental situation, but 
in all cases the subject will be required to make a judgment which permits 
an inference as to the ordering of two utility intervals. For example, if as 
above the stimuli are ordered xyz, and y is judged more like z than like z, then 
it will be inferred that y is closer on the subject’s utility scale to z than to z; 
i.e., the utility interval between y and z is greater than the utility interval 
between x and y. If tetrads are used, the judgment will be of the form ‘‘the 
difference (along some specified dimension) between x and y is greater than 
the difference between z and w.”’ 

Although the fundamental relation between pairs of stimuli is interpreted 
in this paper in terms of direct comparisons of utility intervals, as illustrated 
above, the formal structure of the model is applicable to any method, such as 
Siegel’s one-person game [8], for which each judgment implies an ordering 
of two utility intervals. 


Formal Structure of the Model 


The model is based on three primitive notions. The first is a set K, 
interpreted as the set of alternatives, or stimuli. The second is a quaternary 
relation Q on the cartesian product K X K. The relation Q is interpreted 
such that xyQzw holds when the difference in subjective value (utility) between 
x and y is judged greater than the difference between z and w. If the alterna- 
tives are objects among which the subject is stating preferences, then ryQzw 
implies that the difference in strength of preference (value) between x and y 
is greater than the difference between z and w; the subject more strongly 
prefers x to y than z to w. If the four alternatives are tones which differ only 
in frequency, then xyQzw implies that the difference in pitch between x and 
y is judged by the subject to be greater than the difference in pitch between 
z and w. The third primitive is a binary relation P on the set of alternatives 
K. P is the relation of strict preference; xPy holds if and only if the subject 
strictly prefers x to y. 

A numerical interpretation of the relations P and Q is given with the 
following definition of a quaternary utility function. 


Derinition 1. Let U be a real-valued function defined over K; then 
U is a quaternary utility function if it satisfies the following conditions for 
every x, y, z, and win K: 

(a) «Py if and only if U(x) > U(y); 

(b) ryQzw if and only if U(x) — U(y) > U(z) — U(w). 


The basic assumption of the model may now be stated. 





= a ee Se ae Se ae a) 








ROBERT F. FAGOT 159 


QUATERNARY Hyporuesis (QH). There exists a quaternary utility func- 
tion for the relations P and Q and set K, with the following restrictions. 

(a) For every x and y in K, if x ¥ y, then U(x) ¥ Uy). 

(b) For every x, y, 2, and win K, if U(x) > U(y) and U(z) > U(w) and 
x~zory ~ w, then U(x) — Uly) ¥ Ul(z) — Uw). 


Restrictions (a) and (b) are necessary if the relations P and Q are to 
have the desired ordering properties, which will be clear after consideration 
of empirical consequences 1 and 3, presented in the following section. Con- 
dition (b), of course, rules out the possibility that an interval U(x) — U(y) 
equals some other interval U(z) — U(w). However, this exclusion is consis- 
tent with the intended application of the model to situations of strict 
preference. 

For every choice that a subject makes, we may write a corresponding 
inequality in utility values (Definition 1). An experiment would produce a 
set of inequalities for each subject, and a quaternary utility function exists 
for a particular subject (the model holds exactly) if and only if the resulting 
set of inequalities is consistent, i.e., has a solution. There are well-known 
decision procedures for determining whether or not a set of inequalities has 
a solution [e.g., 6]. 

Satisfaction of the QH is sufficient to guarantee the existence of an OM 
scale, but it is not necessary. This follows from the fact that, as will be shown 
in the next section, it is possible to derive from the QH consequences which, 
if satisfied, guarantee the existence of an OM scale, yet these consequences 
do not exhaust the empirical significance of the QH. This means that the 
QH may be disconfirmed even if an OM scale exists. Hence one important 
problem is to find the weakest conditions imposed by the QH which are 
capable of generating the desired OM scale. An allied problem is to define 
progressively stronger alternative models in terms of the empirical conse- 
quences of the basic hypothesis, so that it will be possible to state precisely 
how strong a model is sustained by the data. In the following section some 
of the empirical consequences which follow from the basic hypothesis are 
specified and in the process two alternative models are defined. 


Some Empirical Consequences of the Quaternary Hypothesis 


By empirical or observable consequence of the QH is meant any conse- 
quence implied by the QH about the P and Q relations themselves. Un- 
fortunately, a result obtained by Scott and Suppes ((7], pp. 16-27) implies 
that there is no simple set of consequences of the type to be considered which 
completely exhausts the empirical significance of the QH for all finite sets of 
stimuli. Derivation of the empirical consequences, and proofs of certain 
propositions, such as the independence of these consequences, will not be 
presented, since they are mostly simple, tedious, and mathematically un- 
interesting. 











160 PSYCHOMETRIKA 


The first consequence of the QH is that the binary relation P is a strict 
simple ordering of K. The terminology follows that of Suppes [9]. 


EmptIricAL CONSEQUENCE 1 (Cl). The binary relation P is a strict 
simple ordering of K; P satisfies the following conditions for every x, y, and 
zin K: 

(a) exactly one of the following holds: x = y, xPy, yPzx; 

(b) if xPy and yPz then xPz. 


Thus if Cl is satisfied it becomes possible to derive a ranking without ties 
of the alternatives in K. 

One is perhaps used to thinking of this consequence as the sole axiom 
of an ordinal model, but certain Q relations are predictable from P relations, 
and hence the ordinal model must be defined to include an axiom formulating 
the connection between P and Q where it is possible to predict Q preferences 
from P preferences. For example, if three alternatives are ordered xPyPz, 
then this knowledge alone implies that the distance between x and z is greater 
than the distance between x and y, and also greater than the distance between 
y and z. This notion is formalized below. 


EmpIrRIcAL CONSEQUENCE 2 (C2). For every 2, y, z, and w in K, 
(a) if Py and yPz, then xzQzry and xzQyz; 
(b) if xPy, yPz, and zPw, then zwQyz. 


In most preference experiments, one would hardly expect C2 to be dis- 
confirmed if C1 is satisfied, but this confidence might not be at all sustained 
in certain psychophysical experiments. In any event, C1l—2 are the axioms 
of the Ordinal Model. 

One further consequence of the basic hypothesis permits the formulation 
of a model which generates an OM scale. Consider a subset P of K X K 
such that for all (x, y) in K X K, (a, y) belongs to P if and only if zPy. P 
then is the subset of K X K which is the set of ordered pairs xPy. Then one 
obvious consequence of the QH is that Q is a strict simple ordering of P. 


EmprIricaL CoNSEQUENCE 3 (C3). The quaternary relation Q is a strict 
simple ordering of P (Q is asymmetric, transitive, and connected in P); i.e., 
for every (x, y), (z, w), and (u, v) in P, 

(a) exactly one of the following holds: (x, y) = (2, w), zyQzw, or zwQaxy; 

(b) if ryQzw and zwQuv, then ryQuv. 


The intuitive statement made in the introduction that a complete 
ordering on utility intervals is required for an OM scale corresponds to the 
formal requirement that Q be a strict simple ordering of P. The restriction to 
P is a restriction to positive differences in utility. 

The fact that C2 and C3 are independent has some interesting impli- 
cations. It will be recalled that the intuitive definition of an OM scale requires 











ROBERT F. FAGOT 161 


only that Cl and C3 be satisfied. However, C2 could be disconfirmed even 
if Cl and C3 were satisfied. Hence C1 and C3 are not sufficient for the deriva- 
tion of an OM scale; C2 is necessary. This is the point of the remark made 
in the introduction that the intuitive definition of an OM scale as requiring 
an ordering of the stimuli and an ordering of utility intervals was inadequate. 

C1-3 do not exhaust the empirical significance of the QH. This fact, 
coupled with the fact that these consequences are sufficient to generate an 
OM scale, leads to the important result that an OM scale may exist even when 
the QH is disconfirmed. Such a model is called a weak ordered metric model 
to distinguish it from the stronger model which assumes the existence of a 
quaternary utility function. Empirical consequences 1, 2, and 3 define the 
Weak Ordered Metric Model (hereafter abbreviated as WOM); i.e., C1-3 are 
the axioms of the WOM. It should be noted that although the term weak is 
used, the model still generates a higher-ordered metric scale. 

Although C1-3 are the set of necessary and sufficient conditions for the 
existence of an OM scale, it is of considerable interest to formulate further 
consequences of the QH, for a number of reasons. First, completely separate 
from the problem of measurement, these consequences could be used for the 
prediction of choice behavior. Second, if the QH is disconfirmed, then it 
becomes possible to state precisely in which way it was disconfirmed by 
examining each of the consequences. Third, the formulation of such conse- 
quences permits the construction of progressively stronger models of choice 
behavior which can be compared in experimental situations. Finally, the two 
consequences to be presented exhaust the empirical significance of the QH 
under certain special conditions to be discussed after their presentation. The 
first of these consequences is presented below. 


EMPIRICAL CONSEQUENCE 4 (C4). For all x, y, z, and w in K, if xyQzw 
then zzQyw, wyQzx, and wzQyzx. 


This consequence is stated in its most general form, but in the usual 
experimental situation the observation zyQzw implies xPy and zPw, in which 
case the consequence could be simplified to “if ryQzw then xzQyw”’ for direct 
prediction purposes. In any event, for any given ordering of stimuli, C4 
will have only one conclusion with empirical content, not three. Its empirical 
content corresponds to the arithmetical fact that a > 6 if and only ifa+ 2 > 
b + x. For example, suppose four stimuli are ordered xPyPzPw. Correspond- 
ing to “if zyQzw, then xzQyw”’ is, writing capital letters for the utilities of the 
stimuli, “if X — Y > Z — W, then X — Z > Y — W.” The conclusion can 
be written (X — Y) + (Y — Z) > (Y — Z) + (Z — W), from which it can 
be seen that the interval (Y — Z) is common to both sides of the inequality, 
which can be reduced to X — Y > Z — W, which is of course, the premise. 

The following example shows that C4 has implications which C1-3 
do not. Suppose again that four stimuli are ordered xPyPzPw, and further 











162 PSYCHOMETRIKA 


that the six utility intervals are ordered rwQxzQywQzwQryQyz. P and Q are 
transitive and hence Cl and C3 are satisfied. Inspection of the ordering of 
the intervals indicates that C2 is also satisfied. Note, however, that the 
observation xzQyw implies xyQzw according to C4. But zwQxy was observed; 
therefore C4 is disconfirmed. This means that C4 has implications which 
C1-3 do not. This result confirms the important conclusion already stated, 
that C1-3 do not exhaust the empirical significance of the QH, in spite of the 
fact that these consequences are sufficient to generate an OM scale. 
One final consequence of the QH is now presented. 


EmprricaL CoNSEQUENCE 5 (C5). For all x, y, z, w, u, and vin K, if 
xyQew and uxQuy, then uyQzv. 


The empirical content of this consequence is readily seen if the relations are 
written in terms of the corresponding utility values: if X — Y > Z — W and 
U —X > W — V, then U — Y > Z — V. Adding the two inequalities of 
the premise, (U — X) + (X — Y) > (Z — W) + (W — JV), which of course 
simplifies to U — Y > Z — V, the conclusion. The arithmetical interpretation 
is (i) ifm > nandr> s, thenm+r>n+s. Inasimilar way one can show 
that C5 also has the arithmetical interpretation (ii) if r > n and m + n > 
r+ s, then m > s. 

When an implication of C5 involves four stimuli (e.g., abQcd), then this 
Q relation can also be predicted by C4, although the two predictions may 
differ. Thus one of the subjects in the experiment to be reported made the 
following choices: dfQab, bfQad, cdQbc, and acQcf. The first choice implies 
the second by C4, but the third and fourth imply adQbf by C5, which con- 
tradicts the second. This shows that C5 has implications which C1-4 do not, 
since the total set of observations was consistent with C1-4, but C5 was 
disconfirmed. 

If a C5 implication involves only three stimuli (e.g., zyQyz), then it 
cannot be predicted by C4. It also follows that in such a case the premises 
of C5 must involve only five stimuli, not six. Therefore the number of pre- 


dictions which can be made by C5 but not by C1-4 cannot exceed (*), where 
n is the number of stimuli in K. However, even if a C5-implication is not 
predictable by C4, it may be predictable by C3. Since there are (") ordered 


utility intervals, there will be exactly : 
predictable by C3. Therefore the maximum number of predictions which can 
be made by C5 but not by C1-4 is (") or ‘ 
upper bound reflects the maximum additional predictive power of a model 
which contains C5 as an axiom as well as C1-4. 

Although C1-5 do not exhaust the empirical significance of the QH for 


) — 1 Q relations which are not 


) — 1, whichever is smaller. This 














ROBERT F. FAGOT 163 


all finite sets of stimuli, a weaker statement on sufficiency conditions can — 
be made: if restriction is made to predictions based on not more than two 
choices (i.e., of the form “if A then B” or “if A and B then C”’), then Ci-5 
do in fact exhaust the empirical significance of the QH. This means that 
under such a restriction, any prediction which can be made with the QH 
can also be made with C1-5. The increase in predictive power resulting from 
the utilization of more than two choices is slight for small sets of stimuli 
(less than ten), and hence this is an important result. 

In formulating the WOM, the set of necessary and sufficient conditions 
for the existence of an OM scale have been specified. Such a model may be 
compared in experimental situations with the Strong Ordered Metric Model 
(hereafter abbreviated SOM), which assumes the existence of a quaternary 
utility function. Such a strong model is perfectly satisfied if and only if the 
set of inequalities in utility values corresponding to the set of observed P and Q 
relations has a solution. However, as pointed out, even if the solution does not 
exist, the WOM may be satisfied, and hence an OM scale may exist. 


Relation to Two Other Theories 


The present model will be discussed in relation to two other theories. 
The first is that used by Siegel [8] to obtain an OM scale, and the second is 
a model of riskless choice developed by Adams and Fagot [1]. 

The essential device underlying Siegel’s approach is a one-person game 
in which the subject chooses between two options, each of which is a prob- 
ability combination of two outcomes. A chance event H, which has an ex- 
perimentally determined subjective probability of one-half, is defined. 
The subject is required to choose between two options (x, y) and (z, w). 
If he chooses the first and EF occurs he gets x, while if EZ does not occur he 
gets y. The second option is determined in a similar manner. 

If one now introduces a quaternary relation R such that «yRzw holds if 
and only if the subject chooses the option (x, y) when presented with a choice 
between (x, y) and (z, w), then condition (b), Definition 1, of a quaternary 
utility function may be modified as follows: 


(b’) xyRezw if and only if U’(a) + U’(y) > U'(z) + U'(w). 


Condition (b’) is based on the assumption that an individual makes choices 
among alternatives involving risk as if he were trying to maximize expected 
utility ((8], pp. 212-213). 

Thus there are two methods for deriving OM scales, and it would be of 
considerable interest to determine if the same scales would be derived by 
both methods. The following hypothesis postulates a relationship between 


R and Q. 


RQ Hyporuesis. For every 2, y, z, and w in K, xyRzw if and only if 
rwQzy. 











164 PSYCHOMETRIKA 


This hypothesis is equivalent to the assumption that the utility func- 
tions U and U’ are the same. Given a set of axioms on Q (such as C1-5), 
these axioms can be immediately transformed to a set of axioms on R, by 
means of the RQ hypothesis. It then becomes possible to determine if indi- 
vidual axioms hold for both Q and R, rather than simply testing the stronger 
RQ hypothesis. It should be noted that the existence of U’ is not necessary 
for the derivation of an OM scale; satisfaction of axioms equivalent to C1-3 
is sufficient. 

There are several interesting possibilities which could arise from ex- 
perimental comparisons of the two methods. For example, both U and U’ 
may exist, yet not be the same. Or, neither U or U’ may exist, yet an OM 
scale may exist for both (i.e., the WOM may be satisfied for both Q and R 
observations), and this scale may not be the same. Finally, one method may 
produce an OM scale, but not the other. 

Another theory, formally quite similar to the present model, is a theory 
of riskless choice developed by Adams and Fagot [1]. This theory is concerned 
with subjects’ choices between alternatives which are multidimensional, 
although the theory has been worked out only for the two-dimensional (or 
two-component) case. Such alternatives might be political candidates who 
are described as varying in two characteristics, liberality and foreign policy, 
for example. In this case the subject would be required to state a preference 
for one of every pair of such candidates. Or, in an extension, the alternatives 
may be pairs of objects of any sort. For example, one two-component alterna- 
tive could be book x and book y, and the subject might have to choose between 
this alternative and a second which consisted of two books z and w. It is clear 
then that the components of the two-component alternatives can be thought 
of as belonging to a set K of alternatives, such as has been considered previ- 
ously in this paper. The model assumes that the individual behaves as if he 
assigns subjective values (utilities) to each of the components independently, 
and then adds the values together to get the value of the composite alternative. 
The fundamental assumption of the model is the hypothesis of the existence 
of this additive utility function. Comparable to condition (b), Definition 1, 


(b’’”) (a, y)P(z, w) if and only if U(r) + U"(y) > U"(z) + U" (w), 
where P is the relation of strict preference. 


The model generates an OM scale, and a simple transformation relates 
this model to the present model. This relationship is indicated by the follow- 
ing hypothesis. 

PQ Hyporuesis. For every x, y, z, and w in k, (2, y)P(z, w) if and only 
if rzQwy. 

This hypothesis is equivalent to the assumption that the utility functions 
U and U” are the same. All three models are related by the following corollary. 











ROBERT F. FAGOT 165 


Coro.tuary OF RQ anp PQ Hyporuesss. For every x, y, z, and w in K, 
(x, y)P(z, w) if and only if zyRzw. 


It should be realized that R holds between options and P holds between 
alternatives. Thus in the case of P the outcome is x and y, whereas in the case 
of R the outcome is x or y, depending upon the outcome of a chance event E. 

In this additive model of riskless choice, the basic assumption of the 
additivity of the components is usually used simultaneously to determine the 
utility values and to make predictions. Under these conditions, the predictive 
consequences of the model are much weaker than those using utility values 
obtained outside the model by independent methods of measurement. 
Observations on Q or R relations could provide this independent method of 
measurement. In other words, U or U’ could be used to assign utility values to 
the alternatives in a basic set K, and then the additive model could be used 
to make predictions between pairs of these alternatives, under the assumption 
that the utility values are additive. 


Some Experimental Results 


Some results of an experiment on the utility of course grades afford a 
test of the model and illustrate the derivation of an OM scale. Ten students 
in introductory psychology served as subjects. The alternatives in the set K 
are the course grades A, B, C, D, and F. Appropriate operational definitions 
were given to the relations P and Q such that aPb means that the subject 
prefers the grade A to the grade B, and dfQab means that the difference, in 
value (utility), between the grades D and F is greater than the difference 
between the grades A and B. Care was taken to insure that each subject 
thoroughly understood the operation of comparing two utility intervals and 
reporting the interval which was subjectively greater. 

Let n be the number of stimuli. Then Ny, , the number of Q relations 
predictable from a knowledge of an ordering of the stimuli, is 


0 n= 6)+() 


In this experiment there are five stimuli in K; if all comparisons are made, 
the complete data for each subject would consist of 10 P relations and 45 
Q relations. Of these 45 observations, 25 would be predictable from C2 
(No = 25). However, in a preference experiment of this sort, there seems 
little necessity for making a complete test of this consequence, so only 5 
such comparisons were presented to each subject. The responses of each of 
the ten subjects were consistent with C2. C1 was not tested; it was assumed 
that all subjects preferred the grades in the order aPbPcPdP}. 

The choices of one of the subjects will be selected for discussion. Table 1 
lists the 20 choices of this subject which cannot be predicted from a knowledge 


BSPTTIITs a Bi 


AF 











166 PSYCHOMETRIKA 


of an ordering of the stimuli, and hence these observations contain the 
ordered metric information. 

These choices are divided into subsets by means of C4; subset B is pre- 
dictable from subset A by means of C4. Utilizing this consequence one can 
predict (16) from (2), (17) from (5), (18) from (3), (19) from (12) and (20) 
from (8); thus C4 is perfectly satisfied. 


TABLE 1 


Observations of One Subject in Experiment on Utility of Grades 











Subset A Subset B 
l. abQbc 6. bfQab ll. acQcd 16. acQbd 
2. abQcd 7. beBed 12. diQ@ac 17. bfQac 
3. dfQab 8. dfQbc 13,- ci@ac 18. bfQad 
4. bdQab 9. cfQbc 14. adQdf 19. cfQad 
5. ciQab 10. dfQcd 15. dfQbd 20. cfQbd 





A necessary and sufficient condition for admission of an observation to 
subset A is that the two intervals corresponding to the two ordered pairs of 
a Q relation do not overlap. For example, if xPyPz, then the only such relation 
is xyQyz (or yzQry, whichever is observed). The number of Q relations in 
subset A is 


: m= (4) 


The two intervals for a Q relation in subset B always overlap. For 
example, observation (16) implies A — C > B — D, and it is clear that these 
two intervals have the interval B — C in common. The number of observations 
in subset B, and hence the number of predictions that can be made in any 
experiment by means of C4, provided that certain observations are made in 
subset A, is 


3 n-() 


It is important to note that although NV, choices can be predicted from certain 
choices in subset A, these predictions cannot be made by the WOM, since 
C4 is not an axiom in this model. 

Inspection of Table 1 shows that the choices of this subject were con- 
sistent with C3 (Q is transitive), and with C4. He satisfies not only the WOM 











ROBERT F. FAGOT 167 


but also the SOM (the set of inequalities corresponding to the observations 
have a solution). The OM scale which is derived from the observations in 
Table 1 is defined by the following ordering of utility intervals 


(4) A-F>B-F>C-F>A-D>D-F>DA-C> 
B-—-D>A-B>B-C>C-D. 


In general, if there are 7 intervals, then (¢ — 1) Q relations are sufficient 
to define the OM scale—but not always necessary. For example, of the nine 
Q relations corresponding to (4), the following seven are sufficient to generate 
the OM scale by the WOM: cfQad, adQdf, dfQac, acQbd, bdQab, abQbe, bcQed. 
The relations afQbf and b/Qcf are not necessary, since they are implied by 
the P relations by means of C2. Thus the remaining 38 Q relations contain 
superfluous information since they are predictable from these seven by means 
of the axioms of the WOM, thus providing a test of the model. Depending 
on the structure of the scale, as few as three observations may in some cases 
provide sufficient information for the construction of the OM scale. Un- 
fortunately, precisely which observations provide this information cannot be 
specified in advance. However, Siegel ([4], p. 141) has developed a “‘maximin” 
method which maximizes the number of predictions which may be made from 
observing a minimum number of choices, and specifies the order in which 
choices should be examined in constructing the scale. His method may be 
adapted for use with the present model. 

The number of observations necessary and sufficient to represent the 
entire set of observations can be taken as a measure of the predictive power 
of the model. The smaller the number of such observations, the greater is the 
predictive power of the model. In the case above, two of the observations, 
cfQad and acQbd are not necessary in the SOM since they are implied by 
dfQac and abQcd, respectively, by C4. Thus in the SOM only five observations 
are necessary, whereas in the WOM seven are necessary, indicating the 
correspondingly greater predictive power of the former. In this experiment 
the number of choices necessary and sufficient to represent the total set of 
45 observations ranged, in the WOM, from a low of six for one subject to a 
high of eight for another subject, and in the SOM, from a low of three to a high 
of five, the lower values of the SOM being indicative of its greater predictive 
power. This concept of predictive power is treated more extensively in other 
sources ({5]; [3], pp. 28-36). 

The general results of the experiment are as follows. 

1. Seven of the ten subjects perfectly satisfied the WOM; hence for these 
seven subjects it was possible to construct an OM scale. 

2. Two of these seven failed to satisfy the SOM. These two subjects 
are of special interest, since for them it was possible to construct an OM scale 
in spite of the fact that they did not satisfy the QH. One of these two had a 


3 $Y322 OB FART &t 


SEITE test B-Bt 


ris 











168 PSYCHOMETRIKA 


single disconfirmation of C4, and the second had a single disconfirmation 
of C5. 

3. Seven of the ten subjects perfectly satisfied C4 (five times each); 
the remaining three each had a single disconfirmation. 

4. At most one prediction for each subject can be made by C5 which 
cannot be made by means of C1-4, since there are only five stimuli in this 
experiment. The results were that one such prediction was made for each of 
three subjects, and one of these predictions was disconfirmed. (If the number 
of stimuli were increased by one to six, then the maximum number of impli- 
cations of C5 for each subject would increase from one to six.) 

5. For each of the three subjects who did not perfectly satisfy the WOM, 
the reversal of a single choice would have resulted in satisfaction of the model; 
however, one of these three still would not have satisfied the SOM. Methods 
of analyzing the fit of the model have been devised, but discussion will be 
deferred to a later paper on the analysis of a series of such experiments. 


REFERENCES 


{1] Adams, E. W. and Fagot, R. F. A. model of riskless choice. Behav. Sct., 1959, 4, 1-10. 

[2] Coombs, C. H. Psychological scaling without a unit of measurement. Psychol. Rev., 
1950, 57, 145-158. 

[3] Fagot, R. F. An ordered metric model of individual choice behavior. Tech. Rep. No. 
13, Contract Nonr 225(17), Stanford Univ., 1957. 

[4] Hurst, P. M. and Siegel, S. Prediction of decisions from a higher ordered metric scale 
of utility. J. exp. Psychol., 1956, 52, 138-144. 

[5] Kemeny, J. and Oppenheim, P. Systematic power. Phil. Scz., 1955, 22, 27-33. 

[6] Kuhn, H. W. Solvability and consistency for systems of linear equations and inequali- 
ties. Amer. math. Mon., 1956, 63, 217-232. 

[7] Scott, D. and Suppes, P. Foundational aspects of theories of measurement. Tech. Rep. 
No. 6, Contract Nonr 225(17), Stanford Univ., 1957. 

[8] Siegel, S. A method for obtaining an ordered metric scale. Psychometrika, 1956, 21, 
207-216. 

[9] Suppes, P. Introduction to logic. Princeton: Van Nostrand, 1957. 


Manuscript received 9/29/58 


Revised manuscript received 10/24/58 














PSYCHOMETRIKA—VOL. 24, NO. 2 
JUNE, 1959 


A NOTE ON FACTOR ANALYSIS: ARBITRARY 
ORTHOGONAL TRANSFORMATIONS 


Epwarp E. Cureton 


UNIVERSITY OF TENNESSEE 


A modification of the Gram-Schmidt process yields an easily constructed 
orthogonal transformation matrix which may be used to rotate a centroid, 
principal axis, or maximum likelihood factor matrix in a manner such that 
one of the new axes has predetermined direction. The procedure is illustrated 
by rotating a centroid factor matrix into an abbreviated bifactor matrix, the 
general factor being defined as the centroid of a specified subgroup of reason- 
ing tests. 


Starting with an n by m orthogonal factor matrix A, such as a centroid, 
principal axis, or maximum likelihood matrix, with entries a;, representing 
the loadings of n tests or other initial variables on m factors, it is decided 
on substantive (rather than mathematical) grounds that one rotated axis 
should pass through a particular point. The other axes may be anywhere, 
so long as they remain orthogonal to the one chosen. The problem is to 
construct an orthogonal transformation matrix to accomplish this purpose. 

Reyburn and Taylor have discussed this problem [2], and have presented 
a method for constructing the desired transformation matrix. Their method, 
however, is fairly complicated, though less so, as they point out, than the 
direct application of the Gram-Schmidt process. By taking advantage of 
the fact that all of the axes but one may be defined arbitrarily, the Gram- 
Schmidt process may be simplified to the point where the desired trans- 
formation matrix can be computed with much less effort than is required 
by the Reyburn-Taylor procedure. 

The general Gram-Schmidt process converts any matrix X into an 
orthogonal matrix Z whose first row is a unit vector collinear with the first 
row vector of X. If the first row vector of X is a unit vector, the first row 
vector of Z is identical to it. In this case the general Gram-Schmidt equa- 
tions are 


Z,=X,, 
Z, = [X2 — (4,-X2)4,]/K2 , 

Z, = [Xs — (Z,-Xs)Z, — (Z2-Xs)Za)/Ks , 

Z, = [Xs — (4,+Xy)Z, — (Ze XZ. — (Zs-X)Zi/K, , 


Zs = [Xs = (Z,:X5)Z, = (Zo°X5)Zo — (Z,:X5)Z3 - (Z,-X5)Z4]/Ks ’ 
169 








170 PSYCHOMETRIKA 


etc. The Z; and X; are the row vectors of Z and X, the Z,-X; in parentheses 
are scalar products, and the K; are normalizing factors. 


Let the selected unit vector be a, b, c, d, e, --- , and let X be of the form, 
[a bed el] 
10000 
X={0 1 0 0 O}- 
00100) 
io 0 8 1 0. 





The scalar products of the Gram-Schmidt equations then become single 
elements of Z, and 


Se Ey, 
01k, ~aB Ks, 

v= 1, + 3%, ~220%, . 

¥w (X, ~ it, ~ ea ~ SK, , 

nt) Geen | Meee © eee ee a 


etc. These equations can be solved literally for the two-variable, three- 
variable, four-variable, and five-variable cases. First define 


ke = 1 — a’, 
ky =1-— a — DB’, 
ki=1-ad-0v-c. 


These k’s are related but not identical to the K’s of the previous equations. 
The desired transformation matrices are then 


z=|° |. 
b —a 


a b c 
Z= : —ab/k, —ac/k.|, 
0 c/k. —b/k, 
ki b c d 
Z= k, -—ab/k, —ac/k, —ad/k, 
c k3/ke —be/kok; —b d/koks 


0 0 d/ks —c/ks 














[a b c d e 
k, —ab/k,  —ac/k, —a a/k, —ae/k. 
Z=|0  ks/ke —be/koks —bd/kaks —be/kokg |- 
0 0 ka/ks —cd/ksk, —ce/ksk, 
LO 0 0 e/keg —d/ky 





EDWARD E. CURETON 





171 


The system by which larger matrices may be constructed will be clearer if 
it is noted that k, = 1 and that the final k of each matrix (not shown as such 
in the last row of each of those above) is equal to the last element of the 
first row. Thus for the 5 by 5 matrix, kj = 1 — a’ — b’ — c’ — d’ = e’, and 
k, = e, since Z, is a unit vector. With this notation, 


T a b c d e 
kp/k, —ab/k,k, —ac/kik, —ad/kik,z —ae/k,k, 
Z={| 0 ks/ke —be/koks —bd/kok, —be/keks 
0 0 Kku/ks —cd/ksky —ce/kgk, 
| 0 0 0 ks/ka —d e/kaks_| 











and for the two-, three-, or four-variable case only the first two, three, or four 
rows and columns are used.* The required transformation is then 


AZ’ =G, 


G being the rotated factor matrix. 


Example 


The A matrix of Table 1 is a centroid matrix computed by the writer 
from a table of intercorrelations reported by Swineford ([3], p. 14, Table 4). 
Her subjects were 504 school children of both sexes in grades 5 through 12, 
all between the ages of 12-0 and 14-5, and all from the Chicago metropolitan 
area. The centroid solution was iterated ten times to obtain communalities 
all stable within + .001. 

Swineford argues on substantive (psychological) grounds that a general 
factor can be defined by the first three tests, which appear to the present 
writer to be reasoning tests. Accepting her argument, the centroid matrix 
will be rotated orthogonally so that one axis passes through the centroid of 
tests 1, 2, and 3. The first.row of the transformation matrix Z consists of 
the normalized sums of the first three entries in each column of A. The rest 
of the transformation matrix is constructed by the Gram-Schmidt formulas 

*I am indebted to my colleague, Dr. Frederick A. Ficken, Department of Mathe- 


matics, University of Tennessee, for a proof that Z, as defined by this last matrix, is actu- 
ally orthogonal. 








172 PSYCHOMETRIKA 


for the three-variable case. Thus 
.9059 .2405 —.3485 
Z =| 4235 —.5144 .7455 |- 
0 — 8229 —.5679 


TABLE 1 


Factor Matrices (Swineford Date; N= 504) 

















Name of test A 3 G Fr s 
“11 -%2 %3 OF 811 Bin 813 fir fin fa 1% 43 
1.Arith.Comput. 659 285 -308 610 773 -097 -060 773 +-1ll = -026 677 000 000 458 
2.Series Complet. 643 065 -263 477 683 058 085 683 101 -019 688 000 000 473 
3.Deduction 619 160 -188 444 665 040 -025 665 ll 046 661 000 000 437 
4.General Inf. 802 228 235 750 699 398 -321 699 054 508 757 000 415 745 
5.Reading Comp. 730 202 106 585 673 284-226 673 041 361 718 000 265 586 
6.Word Meaning 756 261 397 797 609 482 -440 609 030 652 668 000 622 833 
7.Punched Holes 716 -512 -085 782 555 503 470 555 688 023 578 652 000 759 
8.Drawings 660 -420 029 613 487 517-329 487 598 133 537 577 000 621 
9. Vis. Imagery 638 -266 057 481 494 450 187 494 450 186 554 413 000 477 
Sum 6223 003 000 5539 5638 2635 -001 5638 1862 1864 5389 
Check 5638 2634 -002 5638 1863 1864 





Note: All entries have three decimal places. 


Postmultiplying A by Z’ is equivalent to multiplying A by Z, row by 
row. Doing this, the entries in matrix G of Table 1 are obtained. Those in 
the g;, column are the loadings on the general factor defined by the centroid 
of tests 1, 2, and 3. These tests have small loadings summing to zero within 
rounding error in columns g;2 and g;3 , as they should. For the rest, columns 
Ji2 and g;, represent a residual centroid matrix; all entries in column 9,2 
except those of the first three tests are positive and substantial, and those in 
column g;; sum to zero within rounding error. If this were not already a 
residual centroid matrix, it could be made into one by normalizing the sums 
of the last six entries in columns g;. and g;3 , using these values as the first 
row of a new two-variable Z, completing this Z by the Gram-Schmidt formulas, 
and postmultiplying columns g;, and g;; by Z’. 

To complete the derived bifactor solution, the residual centroid should be 
located at the center of the positive quadrant. Hence the axes of g,;. and 
giz Should be rotated so that the new axes will be at 45 degree angles to the 
first residual centroid axis. To do this one employs the two-variable Landahl 
transformation [1], 

















EDWARD E. CURETON 173 


aye Ine en 
7071 —.7071 


Postmultiplying columns g,;, and g;3 of G by L’, yields the columns f;, and 
f;, of matrix F, Table 1. Column f;, is identical to column g;, of matrix G, 
and matrix F is the derived bifactor solution. Matrix S is Swineford’s original 
bifactor solution, computed by Holzinger’s methods. 

Considering first the similarities between the two solutions, it is clear 
that the first factor is a general factor, the second factor a space factor, and 
the third factor a verbal factor. So far as gross factorial interpretation is 
concerned, the two solutions are essentially equivalent. 

Considering the differences between the two solutions, it is the writer’s 
opinion that the derived solution is superior on three counts. First, it uses 
all of the common variance. This property, admittedly, depends upon the 
fact that the centroid solution was iterated, a tedious procedure at best. 
But the centroid communalities sum to 5.539, while the direct bifactor 
communalities sum to only 5.389. The distributions of the third-factor 
centroid residuals (at the tenth iteration) and the direct bifactor residuals 


appear below. 


070 050 030 010 -010 -—030 -050 -—070 -—090 —110 
Residual: 089 069 049 029 009 «=-—O0l1 -—031 -—051 -—071 —O091 





Centroid: 1 9 18 6 2 
Bifactor: 1 1  § 3 14 5 3 1 1 


The centroid method clearly gives a better factorial fit. With larger matrices, 
moreover, the number of iterations required to stabilize the communalities 
is much reduced. 

The second reason for preferring a derived solution is that in it errors 
are averaged over whole columns of the correlation matrix, and distributed 
over all factor loadings, both before and after rotation. In the direct bifactor 
method, near-zeros become exact zeros, and all error variance is forced into 
the nonzero loadings. 

The third reason for preferring a derived solution is that it shows the 
secondary relationships more clearly. In columns f;, and f;; of F, all entries 
other than the first three are positive. Hence it is clear at once that there 
is some variance common to the verbal and space factors over and above 
what they have in common with the reasoning-general factor. In the direct 
bifactor solution, the general factor is not wholly defined by the three reason- 
ing tests; it includes also most of the variance common to the space tests 
and the verbal tests. A certain amount of additional ‘forcing’ is necessary 
to achieve this result; it cannot be produced by rotations in 3-space without 
moving the general-factor axis away ffom the centroid of the three reasoning 





174 PSYCHOMETRIKA 


tests, in which case these tests acquire larger loadings on the other two factors. 

This is, in fact, an abbreviated bifactor solution. A complete bifactor 
solution ([4], ch. XIX) requires a general factor in addition to as many group 
factors as there are simple-structure factors. The general factor is then 
defined by the correlations between the primary factors of the oblique simple 
structure rather than by any one subgroup of the tests, and the general-factor 
loadings are obtained by projecting the second-order factor loadings back 
into the first-order domain. 


REFERENCES 


{1] Landahl, H. D. Centroid orthogonal transformations. Psychometrika, 1938, 3, 219-223. 

[2] Reyburn, H. A. and Taylor, J. G. Some factors of personality. Brit. J. Psychol., Gen. 
Sec., 1939, 30, 151-165. 

[3] Swineford, F. A study in factor analysis: the nature of the general, verbal, and spatial 
bi-factors. Supplementary educ. Monogr., 1948, No. 67. 

{4] Thomson, G. H. The factorial analysis of human ability. (3rd ed.) New York: Houghton 
Mifflin, 1948. 


Manuscript received 6/5/58 











QO wr'D 














PSYCHOMETRIKA—VOL. 24, No. 2 
JUNE, 1959 


RANDOMLY PARALLEL TESTS AND LYERLY’S BASIC 
ASSUMPTION FOR THE KUDER-RICHARDSON FORMULA (21) 


Freperic M. Lorp 
EDUCATIONAL TESTING SERVICE 


The K-R (21) formula can be derived without recourse to the unde- 
sirable assumption that the test items are all indistinguishable. 


Certain of the statements made by Lyerly [4] about the assumptions 
underlying the K-R (21) formula will mislead readers to reject it on the 
grounds that the assumptions are rather implausible.* The present note is 
written in the hope of preventing such a conclusion. 

Lyerly ((4], p. 269) states “The basic assumption for K-R (21) --- 
implies that individual items have no separate identities, even though each 
may be worded differently or may ask a question different from that asked 
by any other item. Furthermore, item equivalence from person to person 
does not exist. The same item, worded identically, may appear on everyone’s 
test booklet, but item 1 for Subject A is regarded as being no more like item 
1 for Subject B than it is like item 2 for Subject B.”’ A few sentences later, he 
asserts that Lord’s derivation [1] ‘implicitly uses this assumption.” 

It is a fact that K-R (21) can be derived from the fundamental assumption 
just quoted. It does not follow that the derivation of K-R (21) in [1], or other 
derivations, “implicitly use’? this assumption, or imply this implication. 
K-R (21) was originally derived on the assumption that all the test items 
were equivalent (of equal difficulty and discriminating power); it does not 
follow that the derivation in [1] is based on the same assumption, although 
this too has been asserted. If two sets of assumptions lead to the same con- 
clusions in all possible applications, then they may be considered, in a sense, 
identical assumptions; but the fact that they lead to the same conclusion 
in some particular case does not prove that the assumptions are identical, 
nor that their implications are the same. 

From one point of view, the derivation of K-R (21) in [1] does not rest 
on “assumptions” at all. The derivation simply asserts the following. 

Prepare a very large pooi of dichotomous achievement or aptitude items 
of any kind or kinds whatsoever. Draw randomly and independently from 

*While Lyerly’s assumptions may seem rather implausible for ordinary mental 
testing, they are highly appropriate for certain other situations—for example, for rating 
situations where no two subjects are rated by the same judges. Lyerly’s results should 


be of real value in such situations. The present note deals only with the usual type of 
objective test, as do the references cited. 


175 











176 PSYCHOMETRIKA 


this pool a number of samples of n > 2 items each. Each sample will be 
treated as an n-item test, administered (with ample time limit) to a group of 
examinees, and scored by counting the number of right answers for each 
examinee. (In actual practice special measures will be needed to minimize 
fatigue and practice effects, and also other types of variability within the 
individual examinee; these admittedly are not taken into account by the 
derivation, but they are irrelevant to the point at issue here.) 

Now, take the scores of a particular examinee on the various n-item 
tests and compute an unbiased estimate of their standard deviation. This 
is his estimated standard error of measurement. Obtain an average (more 
precisely, the quadratic mean) of the estimated individual standard errors 
of measurement. What the derivation in [1] asserts is that the average standard 
error of measurement so obtained is the same (except for sampling fluctua- 
tions) as that obtained from K-R (21) for any of the n-item tests by the 
usual formula 


S. E. Meas. = 3, V1 —7,, , 


where r,, is K-R (21). 

The foregoing assertion, made in [1], is experimentally verifiable, pro- 
viding fatigue, practice effect, and other temporal variability in the examinee 
can be minimized. If temporal effects are eliminated from consideration, so 
that the order in which the items are administered is immaterial, then it will 
make no difference whether the various n-item tests are drawn by random 
sampling in advance of the testing, or whether the pool of items is administered 
first and the random samples of n items drawn afterwards. In the latter case, 
the experiment is identical with that performed in an elementary statistics 
course of drawing black and white balls from an urn. The results that will 
be obtained has been well known since the time of Jacob Bernoulli. 

When the examinees took an n-item test in the situation just outlined, 
the n items had their own clear identities. This is manifested, for example, 
by the fact that in the case of an easy item most (or even all) examinees 
answer it correctly whenever it is administered. If n is small, one n-item 
test may be appreciably easier than another simply because it happens to 
contain several items that are unusually easy. (This means that the errors of 
measurement for examinee A are not independent, over tests, of the errors of 
measurement for examinee B; but this fact need not detain us here.) 

The n-item tests described are called “randomly parallel” tests. In 
actual practice, “parallel” tests are not constructed by random sampling 
(although in many cases this would probably be better than reliance on sheer 
subjective judgment). On the other hand, neither are tests constructed so 
well that they are strictly parallel according to the assumptions underlying 
K-R (20), as evidenced by the fact that “parallel” forms usually have slightly 
different score means, variances, and so forth. Although neither mathematical 














FREDERIC M. LORD 177 


model describes exactly any actual pair of so-called ‘‘parallel” tests, both 
models are very useful in mental test theory. The utility of the randomly 
parallel test model is evidenced by some of the practical conclusions to which 
it has led, as shown in [2] and [8]. It is desirable, therefore, that this model 
should not be identified with the assumption given in quotes at the beginning 
of this note. 


REFERENCES 


{1] Lord, F. M. Sampling fluctuations resulting from the sampling of test items. Psycho- 
metrika, 1955, 20, 1-22. 

[2] Lord, F. M. Tests of the same length do have the same standard error of measurement. 
Educ. psychol. Measmt, in press. 

[3] Lord, F. M. Statistical inferences about true scores. Psychometrika, 1959, 24, 1-17. 

[4] Lyerly, S. B. The Kuder-Richardson formula (21) as a split-half coefficient, and some 
remarks on its basic assumption. Psychometrika, 1958, 23, 267-270. 


Manuscript received 9/10/58 


























PSYCHOMETRIKA—VOL. 24, No. 2 
JUNE, 1959 


ESTIMATING ITEM INDICES BY NOMOGRAPHS* 


Rosert M. Cotver 
DUKE UNIVERSITY 


Two nomographs are presented for estimating item validity indices 
identical in value to those obtained from Flanagan’s table and to those 
obtained from Davis’ chart. Experience has shown that the use of the nom- 
ographs results in the saving of a significant amount of time with no loss in 
accuracy. The nomographs a provide a method of quick conversion between 
the familiar coefficients and the Davis indices, which are less familiar but 
which offer greater flexibility. 


In evaluating the effectiveness of a test, the difficulty and the dis- 
criminatory power of the component items have significant bearing. The 
usability, reliability, and validity of the test are in part determined by these 
characteristics. 

Some of the more common statistical indices currently in use to evaluate 
test items are: critical ratio, phi coefficient, biserial correlation, point biserial 
correlation, tetrachoric correlation, and product moment correlation. 
Although these indices are based on different underlying assumptions, all 
involve computational procedures that become laborious. The calculation 
needed for the necessarily substantial number of items in a test often repre- 
sents more time and effort than are available to many test makers. 

Two general methods have been developed to reduce this computational 
labor: graphic methods (nomographs and Abacs) for estimating the various 
coefficients, and tabular presentations of coefficients and indices. These tables 
are based primarily on procedures outlined by Kelley [9]. 

Nomographs for estimating critical ratios or ¢’s were developed by 
Appel [1] and Arnold [2]. Dunlap [4} modified the standard Dunlap and 
Kurtz Nomograph [5] to estimate biserial correlations for test-item analysis 
purposes. Kuder [10] devised a set of nomographs to estimate point biserial r, 
biserial r, and four-fold correlations. Lawshe [11] presented a nomograph, 
based on the Kelley procedures, for estimating the discriminatory values of 
test responses using a scale of ‘“D-values” of + 5 to — 5. 

Guilford [8] constructed an Abac for estimating the significant (and very 
significant) levels of phi coefficients. Fan [6], in developing an item analysis 


*© Robert M. Colver, 1959. 
179 











180 PSYCHOMETRIKA 


table, also used an Abac, as did Flanagan [7]. The use of either of these Abacs 
for reading item indices would be difficult because of the complexity of their 
diagrams. 

Currently the most widely used techniques are those involving the 
reading of tables based primarily on Kelley’s procedure. Probably the one 
most satisfactory item validity index is that obtainable from Flanagan’s 
table ({13], pp. 345-351), which provides an estimate of the correlation 
between responses to an item and total test score. One limitation in the use 
of Flanagan’s estimated coefficient is that units on the scale of correlation 
values are not linear; that is, the difference between correlations of .25 and 
.30 is not comparable to a difference between correlations of .93 and .98. 
Davis [3, 12] developed a table of discrimination indices based on a z trans- 
formation of the correlation coefficient multiplied by a constant. This provides 
indices for which the units are approximately equal throughout the scale, and 
avoids the use of decimal points. Davis also includes a difficulty index based 
on a linear transformation of the proportions of success. The Davis table so 
far has had limited use in spite of the advantages of the linearity of his 
indices, because as Thorndike [14] suggested ‘‘--- the numerical values of 
the proposed new indices will be entirely unfamiliar to the user and will 
present some difficulty of interpretation on that account.” 

While the tabular presentations of Flanagan and of Davis have greatly 
simplified the computational problems involved in determining item validities, 
both presentations have minor limitations. Flanagan’s table, as it is usually 
published, has a number of pages of tables to be consulted; and Davis’ table, 
although usually on a single page, is of an unwieldy size. Both tables require - 
that the user make four-way interpolations in many of the readings. 

The nomographs presented here combine the speed and simplicity of 
nomograph reading with the familiarity of Flanagan’s table and the ad- 
vantages of Davis’ indices: they also provide a method of quick conversion 
between the familiar Flanagan coefficients and Davis’ indices, which are less 
familiar but which offer greater flexibility and convenience in use. 


The Nomographs 


The nomographs* shown as Figures 1 and 2 are entered with the same 
data used to enter either the Davis table or the Flanagan table. From them 
the following indices may be obtained: a product moment correlation, the 
Davis discrimination index, an estimated per cent of correct responses in the 
entire sample, and the Davis difficulty index. 

To enter these nomographs the percentage of success on a test item of 
an upper group and the percentage of success on a lower group are needed. 


’ 


*The nomographs as presented here are too small for efficient use. Single copies 
of larger-sized reproduction may be obtained by writing the author. 














ROBERT M. COLVER 181 


As Kelley [9] has shown, these groups should be based on the higher 27% of 
the total sample and the lower 27% of the total sample. (In entering the 
difficulty nomograph these particular percentages are recommended simply 
because they are needed to enter the discrimination nomograph.) 

The percentage of success of each of the two groups can be determined 
by calculating the per cent of each group that marks the correct answer, but 
Davis ({12], p. 277) has shown that the number of examinees that actually 
knows the answer is a more fundamental estimation; therefore, it is recom- 
mended that the percentages of success of each group be determined by the 
formula suggested by Davis ((3], p. 6): 


pw B= WK - 0) 





N — NR 
where 
P_ = the proportion of the sample that knows the answer to an item, 
R = the number of testees in a sample that answer the item correctly, 
W = the number of testees in the sample that answer the item incorrectly, 
N = the number of testees in the sample, 
NR = the number of testees in the sample who do not reach the item in the 
in the time limit, 
K = the number of choices in the item. 


The Discrimination Nomograph (Figure 1) 


To obtain the product moment correlation or the item discrimination 
index locate the determined percentage of success of the higher group on 
scale Py, and the determined percentage of success of the lower group on 
scale P; . Lay a straight edge (a transparent one is preferable) through these 
points. The discrimination indices are then read at the point where the 
straight edge crosses the center scale. The right side of the scale, r, gives 
product moment correlations and the left side of the scale, DsI, gives the 
discrimination index. 

A product moment correlation, determined by any method, can be 
directly converted to a discrimination index by locating the r value on the 
r side of the center scale and reading its corresponding value on the DsI side 
of the center scale. Similarly the DsI values can be converted directly to r 
values. 


The Difficulty Nomograph (Figure 2) 

To obtain an estimate of the percentage of correct responses in the 
entire sample and the difficulty index locate the determined percentage of 
success of the higher group on scale P, and the determined percentage of 
success of the lower group on scale P, . Lay a straight edge through these 








2 


On w 
vie ea aS 
| [| 


(e) 
eee) PPE 


a 


W Nm 
oO ro) 
ae ee a al 


BS 
oO 


L atiesl 


uo 
(e) 
Batt 


o 
Oo 


~ 
(eo) 
OS ae 


& & 
HH 


Liss 


i 


j 
a ee ae 


tt 
ly 


oO 


| 


Pep ee ee "| vu y 


‘alee 


ihe ila sha 


| 


| 


| 





co 
o 
a, 





i} 
Oo 
(e) 

IO 

Lin 
= 

' 
oO 
OW 


! 
| 


a8 8 
1 hi iam 
P. © 
1°) °o 


30 + 50 
40—= 60 
90+ __79 





T 


70 





© 
oO 
LI - 


perrere 


oO 

I 

oO 
i.e) 





100 A 93 


DISCRIMINATION NOMOGRAPH 
Figure 1 





2 99 


| 

| 
Te) 
@ 


| 


97 





—}—96 
——95 
—+ 90 
—-85 
=a 
—=-70 

+60 
—=-50 
—=-40 


W 
Oo 


Here 
[rreneeearyet 


mM 
[e) 


Is 
—t—10 

















25 
30 
35 
40 
45 
50 
55 
60 
65 
70 
75 
80 
85 
90 


95 
99 





Py 





Del 
‘uy 
5— 

(Oo 
20+ 
t—10 
r << Ras 
$<-15 
30-f 
£20 
35+ 25 
+30 
40—+ 
+35 
Bs 
45—--40 
- 45 
50—+—50 
+55 
55—t 60 
E65 
cot 
rt 
65—+-75 
£-80 
ro 
+—85 
75—+ 
+90 
sot 
99-795 
95— 
99—- 99 


DIFFICULTY NOMOGRAPH 
FIGurE 2 


20 
25 
30 
35 
40 
45 
50 
55 
60 
65 
70 
75 
80 
85 
90 


95 
99 











184 PSYCHOMETRIKA 


points. The indices are then read at the point where the straight edge crosses 
the center scale. The right side of the scale, p, gives an estimate of the per- 
centage of correct responses in the entire sample and the left side of the 
scale, DI, gives the difficulty index. 

Direct conversion of percentage of success, determined by any method, 
to the difficulty index can be made by locating the value of p on the p side of 
the center scale and reading its corresponding difficulty index on the D;I side 
of the center scale. Similarly a difficulty index can be converted directly to 
an estimated percentage of success of an entire sample. 


Other Uses and Limitations 


It has been suggested ({12], p. 298) that ‘‘Flanagan’s table has general 
utility and may be found exceptionally useful in many instances outside the 
field of test construction where economical approximations of biserial cor- 
relations are required.’”’ Similar use could be made of the r’s obtained from 
this discrimination nomograph, as the values are identical to those obtained 


from Flanagan’s table. 
The literature on the use of Flanagan’s and Davis’ tables points out some 
limitations for each table. Similar limitations would apply to the use of these 


nomographs. 


REFERENCES 


[1] Appel, V. Companion nomograph for testing the significance of the difference between 
uncorrelated percentages. Psychometrika, 1952, 17, 325-330. 

[2] Arnold, J. N. Nomogram for determining validity of test items. J. educ. Psychol., 
1935, 26, 151-153. 

[3] Davis, F. B. Item-analysis data: their computation, interpretation, and use in test con-~ 
struction. Cambridge: Harvard Univ. Press, 1949. 

[4] Dunlap, J. W. Nomograph for computing biserial correlations. Psychometrika, 1936, 
1, 59-60. 

[5] Dunlap, J. W. and Kurtz, A. K. Handbook of statistical nomographs, tables, and formu- 
las. New York: World Book, 1932. 

[6] Fan, C. T. Note on construction of an item analysis table for the high-iow-27-per- 
cent group method. Psychometrika, 1954, 19, 231-236. 

[7] Flanagan, J. C. General consideration in the selection of test items and a short method 
of estimating the product moment coefficient from the data at the tails of the distri- 
butions. J. educ. Psychol., 1939, 30, 674-680. 

[8] Guilford, J. P. The phi coefficient and chi squares as indices of item validity. Psycho- 
metrika, 1941, 6, 11-19. 

[9] Kelley, T. L. The selection of upper and lower groups for the validation of test items. 
J. educ. Psychol., 1939, 30, 17-24. 

[10] Kuder, G. F. Nomograph for point biserial r, biserial r, and four-fold correlations. 
Psychometrika, 1937, 2, 135-138. 

[11] Lawshe, C. H. A nomograph for estimating the validity of test items. J. appl. Psychol., 
1942, 26, 846-849. 














ROBERT M. COLVER 185 


[12] Lindquist, E. F. (Ed.) Educational measurements. Washington: Amer. Council on 


Educ., 1951. 
[13] Thorndike, R. L. Personnel selection: test and measurement techniques. New York: 


Wiley, 1949. 
[14] Thorndike, R. L. Review of Davis, F. B. Item-analysis data: their computation, inter- 


pretation, and use in test construction. Psychometrika, 1947, 12, 60-61. 
Manuscript received 5/16/58 
Revised manuscript received 7/14/58 











BOOK REVIEW 


Oléron, Pierre. Les Composantes de l’Intelligence d’aprés les Recherches Factorielles. Paris: 
Presse Universitaires de France, 1957. Pp. viii + 517. 


One of the important consequences for psychology of the founding of a quantitative 
science of biological variation by Francis Galton and Karl Pearson was that Charles 
Spearman, in 1904, could present to the field a paper entitled ‘‘General intelligence ob- 
jectively determined and measured.’’ This paper not only resulted directly in the establish- 
ment of factor analysis as a new discipline in psychology, but it is indirectly responsible 
for Psychometrika and, we must acknowledge, for the livelihood of quite a number of 
people who read this journal. In the more than fifty years of research and controversy 
which the paper initiated, factor analysis has become widely accepted as a general method 
for the study of multivariate relationships and has contributed to the theoretical develop- 
ment of modern statistics. At present the original issues in factor analysis are largely 
resolved and the field now seems to be in a period of consolidation before entering a second 
phase. M. Oléron has chosen this very opportune time to present a complete review of 
contributions in the first phase and to suggest possible directions for research in the second. 
Some indication of the thoroughness of the review is given by the length as well as by 
the 471 item bibliography and useful descriptive list of 157 psychological tests. The ap- 
proach is largely historical, somewhat like Dael Wolfle’s Factor Analysis to 1940 (Psy- 
chometric Monograph No. 3, 1940), and deals with issues and results without going deeply 
into methodology. The only recent comparable work in English, Sten Henrysson’s Ap- 
plicability of factor analysis in the behavioral sciences, (Stockholm, 1957), is less detailed and 
more technical. A similar but less current work is P. Vernon’s, Structure of Human Abilities. 
(New York and London, 1950). 

The review is organized under three main headings: the two-factor theory, the 
multiple factor approach, and results. The first two sections are developed largely in 
terms of the running controversy between the advocates of the two-factor theory of human 
ability, with Spearman at the forefront, and the opposition headed by Godfrey Thomson 
and L. L. Thurstone. The author follows the published record of the dispute closely, re- 
producing the arguments and counter arguments of the period, but not adding much in 
the way of original criticism. Especially in certain technical matters, such as the assump- 
tions about the null distribution of tetrad differences and residual correlations used in 
the tests of the number of significant factors, some definitive criticism based on modern 
statistical theory would have been valuable. The treatment of the two points of view in 
the controversy is eminently fair, but it becomes apparent that the author favors the 
Thurstonian methods because of their greater generality. 

The section on results collects the widely scattered literature of factorial studies 
into a well-organized summary of major findings through 1955. Important factors are 
described and numerical results quoted in some cases, but extensive numerical tabulation 
of factor loadings is avoided. Critical comments in this section are more extensive and 
help the reader considerably in judging to what extent these findings contribute to a co- 
herent description of human abilities. 

A main theme throughout the review is the question of what the factorial studies 
contribute to the concept of intelligence. Factorists generally have tried to avoid dis- 
cussing intelligence, but the wide influence of the Binet scale and the ‘intelligence quo- 
tient’’ in mental testing practice makes this difficult. Both Spearman and Thurstone 
eventually took positions on the question, and they are compared in the review. The 
author points out that Spearman was forced to admit the existence of group factors, al- 
though only as minor sources of test variability. Similarly, the strict multi-factor approach 


187 





188 PSYCHOMETRIKA 


of Thurstone was compromised when the primary abilities were found to be correlated. 
Thurstone interpreted the second-order factor implied by these correlations as general 
ability acting in varying degrees through the modality of group factors. He put principal 
emphasis on the group factors, however, and believed that intelligence would be under- 
stood in a useful way when the catalogue of group factors was complete. 

But it is the opinion of Oléron that the program of factorial exploration envisioned 
by Thurstone has become bogged down by the discovery of a seemingly endless number 
of group factors. Evidence is cited indicating that the appearance of group factors, and 
more important the variance attributable to them, depends upon the arbitrary choice 
of test battery composition and cannot be uniquely determined. As a consequence, factors 
become increasingly fragmented as batteries are more minutely specialized, and there is 
no obvious way to bring them into a common perspective. Various attempts to organize 
the group factors according to their importance in practical terms, such as academic suc- 
cess, are reviewed by the author and rejected as inconclusive and not of sufficient theo- 
retical interest. His alternative suggestion is a more experimental approach intended to 
characterize intelligence not as a catalogue of abilities but as a certain quality of functional 
relationship among behaviors required in a variety of tasks. Instead of a search for pure 
tasks representing isolated abilities, as attempted with factor analysis, the study of in- 
telligence would be closer to that of problem solving behavior and might profit from studies 
of both pathological and normal subjects. This point of view apparently reflects the author’s 
interest in the intellectual capacities of deaf-mutes. His general view is that the factor 
analytic approach is valuable in identifying the major dimensions of human ability but 
cannot supply the more detailed factual information necessary to support a substantial 
theory of intelligence. These facts, he believes, will have to come largely from experi- 
mental studies. 

R. Darre.yt Bock 


University of North Carolina 











